Storyboard & reel timing¶
A storyboard is the creative seed of a reel: an ordered list of cards,
each a short on-screen line with its narration, palette roles, and (for overlay
cards) a media panel. It is the main human input — hand-authored or AI-drafted,
then human-edited — so keryx validates it loudly and renders deterministically
from it. The model and maths are a faithful port of the blog gen-reel.py
(spec §3.1).
Card schema¶
storyboard.json is a JSON array of cards:
| Field | Meaning |
|---|---|
text |
the on-screen line (a tight distillation; \n forces a break). *word* marks accent words. |
vo |
the narration — distinct from text: may be fuller and carry provider control tags (SSML <break>, phonetic spellings). |
bg / fg / accent |
palette roles. Mode-dependent: block uses all three; overlay ignores bg. |
dur |
fallback on-screen seconds, used only when no VO drives timing. |
cover / mono |
bookend cover art / monospace URL closer. |
mode |
block (default) or overlay (full-bleed media + scrim + line). |
scene |
overlay illustration prompt (required for a generated overlay card). |
media |
resolved panel {kind: image\|video, source: generated\|uploaded, path}. |
voice |
per-card voice override (e.g. steadying a line with stability). |
The on-screen text and the vo narration are kept as separate fields — the
batch proved they want different phrasing.
Validation¶
Loaded storyboards are validated; the same findings back the CLI exit 2, the
API 422, and the studio inline checks:
- MUST non-empty
text(R-WS-9); overlay cards have asceneor supplied media (R-WS-10); balanced*markers (R-WS-11); palette roles reference defined keys, mode-aware (R-WS-12). - SHOULD the last card is the mono URL closer (
R-WS-13);bgon an overlay card warns (inert). Cover-card art availability is checked at render time (R-WS-14).
VO-driven timing¶
Narration drives pacing. Each card's on-screen duration is its VO clip length +
a lead (≈0.5s) + tail (≈0.7s); cards crossfade (xfade ≈0.4s). For card
i:
start_i = Σ dur[:i] − i·xfade
vo_delay_i = (start_i + lead) # the VO is placed here
total = Σ dur − (n−1)·xfade
The music bed is later requested at this computed total, not a hand-set
value. The maths is pure and unit-tested (internal/reel), with the renderer
supplying real durations (from ffprobe) and font metrics.
Text rendering¶
- Orphan control: wrapping never leaves a lone trailing word — it pulls one word down from the line above (a flagged defect in the first batch).
- Accents:
*word*tokens render in the accent colour; markers are stripped before measurement. (Parity quirk: in a multi-word*no one safer*only the first/last tokens are accented.)
Status: the deterministic core (schema, validation, timing, wrapping, accents) lands in Phase 1d. Card PNG rendering and the ffmpeg xfade/audio-mux assembly (the
Rendereradapter +keryx reel build) follow in the next phase.