Skip to content

Feature: sync VHS tape Sleep durations from timing.json (Whisper) #11

@jmjava

Description

@jmjava

Problem

VHS demos require Sleep values that match TTS audio length. Today this is manual trial-and-error; docgen compose enforces a max freeze ratio when video is shorter than audio.

Proposal

Add a command (e.g. docgen sync-vhs or docgen tape-sync) that:

  1. Reads animations/timing.json produced by docgen timestamps (Whisper segments on each audio/*.mp3).
  2. Parses each terminal/*.tape for Type / Enter / Sleep blocks after the first Show (preamble unchanged).
  3. Partitions the narrated time span into equal wall-clock windows (one per block) and sets each Sleep to window_duration - estimated_typing_time (typing estimate from Type payload length × configurable ms/char, capped).

Prior art

Course Builder implements this as a standalone script you can lift or vendor:

  • https://github.com/jmjava/course-builder/blob/main/docs/demos/scripts/sync_vhs_sleep_from_timing.py

Acceptance criteria

  • New subcommand documented in README
  • Works with existing timing.json + .tape layout; --dry-run and --segment (stem) filters
  • Optional hook from generate-all / rebuild-after-audio after timestamps when config flag set

Notes

This does per-block alignment to the audio timeline, not word-level karaoke. Manim remains the path for frame-accurate visuals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions