Skip to content

Storyboard & reel timing

A storyboard is the creative seed of a reel: an ordered list of cards, each a short on-screen line with its narration, palette roles, and (for overlay cards) a media panel. It is the main human input — hand-authored or AI-drafted, then human-edited — so keryx validates it loudly and renders deterministically from it. The model and maths are a faithful port of the blog gen-reel.py (spec §3.1).

Card schema

storyboard.json is a JSON array of cards:

Field Meaning
text the on-screen line (a tight distillation; \n forces a break). *word* marks accent words.
vo the narration — distinct from text: may be fuller and carry provider control tags (SSML <break>, phonetic spellings).
bg / fg / accent palette roles. Mode-dependent: block uses all three; overlay ignores bg.
dur fallback on-screen seconds, used only when no VO drives timing.
cover / mono bookend cover art / monospace URL closer.
mode block (default) or overlay (full-bleed media + scrim + line).
scene overlay illustration prompt (required for a generated overlay card).
media resolved panel {kind: image\|video, source: generated\|uploaded, path}.
voice per-card voice override (e.g. steadying a line with stability).

The on-screen text and the vo narration are kept as separate fields — the batch proved they want different phrasing.

Validation

Loaded storyboards are validated; the same findings back the CLI exit 2, the API 422, and the studio inline checks:

  • MUST non-empty text (R-WS-9); overlay cards have a scene or supplied media (R-WS-10); balanced * markers (R-WS-11); palette roles reference defined keys, mode-aware (R-WS-12).
  • SHOULD the last card is the mono URL closer (R-WS-13); bg on an overlay card warns (inert). Cover-card art availability is checked at render time (R-WS-14).

VO-driven timing

Narration drives pacing. Each card's on-screen duration is its VO clip length + a lead (≈0.5s) + tail (≈0.7s); cards crossfade (xfade ≈0.4s). For card i:

start_i        = Σ dur[:i] − i·xfade
vo_delay_i     = (start_i + lead)            # the VO is placed here
total          = Σ dur − (n−1)·xfade

The music bed is later requested at this computed total, not a hand-set value. The maths is pure and unit-tested (internal/reel), with the renderer supplying real durations (from ffprobe) and font metrics.

Text rendering

  • Orphan control: wrapping never leaves a lone trailing word — it pulls one word down from the line above (a flagged defect in the first batch).
  • Accents: *word* tokens render in the accent colour; markers are stripped before measurement. (Parity quirk: in a multi-word *no one safer* only the first/last tokens are accented.)

Status: the deterministic core (schema, validation, timing, wrapping, accents) lands in Phase 1d. Card PNG rendering and the ffmpeg xfade/audio-mux assembly (the Renderer adapter + keryx reel build) follow in the next phase.