Storyboard & reel timing¶

A storyboard is the creative seed of a reel: an ordered list of cards, each a short on-screen line with its narration, palette roles, and (for overlay cards) a media panel. It is the main human input — hand-authored or AI-drafted, then human-edited — so keryx validates it loudly and renders deterministically from it. The model and maths are a faithful port of the blog gen-reel.py (spec §3.1).

Card schema¶

storyboard.json is a JSON array of cards:

Field	Meaning
`text`	the on-screen line (a tight distillation; `\n` forces a break). `word` marks accent words.
`texts`	positioned text blocks — several placed runs of text on one scene, superseding `text` when present. See Positioned text below.
`vo`	the narration — distinct from `text`: may be fuller and carry provider control tags (SSML `<break>`, phonetic spellings).
`bg` / `fg` / `accent`	palette roles. Mode-dependent: `block` uses all three; `overlay` ignores `bg`.
`dur`	on-screen seconds, used when no VO drives timing.
`lead` / `tail`	seconds of silence this scene holds before and after its narration, overriding the theme's. An override replaces rather than adds — `lead: 1.2` means 1.2s, not 1.2s on top. `0` is a real value meaning no air at all.
`mono`	render this card in the monospace face — the convention for the closing URL / call-to-action card.
~~`cover`~~	retired. It scaled the reel's cover art into a box above the text, losing most of it. Use an overlay card whose `media` is the cover instead — `keryx storyboard migrate` converts old boards.
`mode`	`block` or `overlay` (full-bleed media + scrim + line). Omit it to take the theme's `card.mode` — see How a card's mode is decided below.
`scene`	overlay illustration prompt (required for a generated overlay card).
`media`	resolved panel `{kind: image\\|video, source: generated\\|uploaded, path}`.
`voice`	per-card voice override — any of `speaker` (a registered voice from the catalog, for multi-author reels), `stability`, `similarity`, `style`, `speed`, `model` (a TTS model for this line), `pronounce` (a respelling used as this line's narration, any model), `ipa` (a phonetic rendering used on a phoneme-capable model, ignored on `multilingual_v2`). Steady a wobbly line (`stability`), slow a fast one (`speed`), switch speaker, or fix a mangled word (`pronounce`/`ipa`), without touching the rest of the reel. Defaults come from the reel theme's `voice`.

How a card's mode is decided¶

A card renders as block (caption on the palette's base colour) or overlay (the illustration full-bleed, scrim, caption over it). The mode is resolved at render time, in this order:

the card's own mode, if set;
otherwise the reel theme's card.mode;
otherwise block.

Then one safety rule: an overlay card with no media degrades to block rather than failing the build, and says so:

card 3: no illustration selected — rendering as a text block

That keeps reel build usable on a partly illustrated board, which is the normal state mid-authoring. The inverse is also reported — a block card carrying an illustration renders without it, and the build tells you rather than dropping it silently:

card 2: illustration ignored — card mode is block

Picking an illustration (cards pick / cards set) sets mode: overlay on a card that has no explicit mode, since wiring media onto a block card would mean the image is never drawn. An explicit block is respected.

Note the field is media, not img. An unrecognised key is ignored silently, so img: cards/01.png does nothing.

The on-screen text and the vo narration are kept as separate fields — the batch proved they want different phrasing.

A per-line --stability/--similarity/--style/--speed passed to voice gen --line N is persisted into that card's voice block, so a later reel build reproduces the tuned read.

Validation¶

Loaded storyboards are validated; the same findings back the CLI exit 2, the API 422, and the studio inline checks:

MUST have a length: a card with no text needs narration or an explicit dur (R-WS-9, revised by spec 0050). A scene may be all picture and voiceover — text has never been what decides how long a card stays on screen, so the rule is about time rather than words. What is rejected is a scene with nothing to say and no length, because that is decided by fallbacks rather than by its author.
overlay cards have a scene or supplied media (R-WS-10); balanced * markers (R-WS-11); palette roles reference defined keys, mode-aware (R-WS-12).
SHOULD the last card is the mono URL closer (R-WS-13); bg on an overlay card warns (inert). Cover-card art availability is checked at render time (R-WS-14).

VO-driven timing¶

Narration drives pacing. Each card's on-screen duration is its VO clip length + a lead (≈0.5s) + tail (≈0.7s); cards crossfade (xfade ≈0.4s). For card i:

start_i        = Σ dur[:i] − i·xfade
vo_delay_i     = (start_i + lead)            # the VO is placed here
total          = Σ dur − (n−1)·xfade

The music bed is requested at this computed total by default, so it covers the whole reel — both music gen --takes and reel make size the bed to the VO-driven total once the VO clips are promoted (music gen --length <d> overrides it explicitly; with no VO promoted yet the bed falls back to a default length). The maths is pure and unit-tested (internal/reel); the VO-driven total is computed once in reelcmd.ReelTimingFromVO and shared by reel build, music gen --takes, and reel make, with the renderer supplying real durations (from ffprobe) and font metrics.

Positioned text¶

By default a card has one line of text, placed by its mode's convention: centred for a block card, a lower-third caption for an overlay. texts replaces that with several blocks, each placed where you put it:

{
  "texts": [
    { "text": "Headline", "pos": { "x": 0.5, "y": 0.22 }, "width": 0.8, "size": 1.3, "role": "amber" },
    { "text": "a *placed* footnote", "pos": { "x": 0.3, "y": 0.8 }, "width": 0.45, "align": "left" }
  ]
}

Field	Meaning
`text`	the block's words. `word` accents work per block.
`pos`	`{x, y}` as fractions of the frame — the block box's centre on both axes. `{0.5, 0.5}` is dead centre. Omit for the mode's default placement.
`width`	the box's width as a fraction of the frame. Omit for the mode's default.
`align`	`centre` (default), `left` or `right` — how text ranges inside the box. It does not move the box.
`accent`	palette role for this block's `marked` words, overriding the card's.
`size`	scales the theme's text size. `1.3` is thirty percent larger; omit for the theme's own.
`role`	a palette role (`amber`, `cream`), overriding the card's `fg`. Never a hex — see below.

texts supersedes text. A card with neither positions nor blocks behaves exactly as it always has, so every storyboard written before this feature renders byte for byte identically. That is enforced by a test, not a promise.

Four rules worth knowing¶

Coordinates are fractions, not pixels. The same board renders at 1080×1920 today and could render larger tomorrow; a pixel anchor would drift.
An anchor is the box's centre, not its top-left. Add a line and the block grows both ways, staying where you put it, instead of creeping downward.
A block's width is its own property, not a consequence of where it sits. Move a block and it keeps its shape and its line breaks. Resize it deliberately, by dragging its edges.
role and accent name palette entries, never colours. A hex written into a board would not follow a theme change, which is the thing themes exist to make possible. role is the block's text colour, accent the colour its *marked* words take; both fall back to the card's fg / accent, and those to the theme.

Alignment ranges text; it does not move the block¶

align decides how each line sits inside the box — left, centred or right. The box itself does not move, so switching alignment never relocates a block. Give a block a narrow width and range it left, and you get a left-hand column exactly where you placed it.

Placing text in the studio¶

The Scenes tab's preview is the layout surface: drag a block to place it, with guides snapping to the thirds and the centre. Keyboard works too — focus a block and use the arrow keys (hold Shift for a coarser step). The scene inspector lists every block with its alignment, size, colour role and exact percentages, plus + add text and a reset that returns a block to the default placement without deleting the words.

The preview is faithful, not decorative. Position, box width, alignment, colour role, text size and the typeface itself come from the same model as the renderer — the studio serves the theme's actual font files to the browser, so lines break where the render breaks them. The two implementations are pinned to a shared fixture so they cannot drift apart.

That matters more than it sounds: because a block is anchored by its centre, a line breaking differently is a block sitting differently. If the preview wrapped text its own way, everything below the break would be in the wrong place.

Measured agreement: block boxes to 0.2px and glyph positions to under 2px, at the render's 1080×1920. What you align in the preview is what the render produces.

Text rendering¶

Orphan control: wrapping never leaves a lone trailing word — it pulls one word down from the line above (a flagged defect in the first batch).
Accents: *word* tokens render in the accent colour; markers are stripped before measurement. (Parity quirk: in a multi-word *no one safer* only the first/last tokens are accented.)

Validation catches a block whose anchor falls outside the frame, whose size is negative, or whose role names no palette entry — a storyboard is a file people hand-edit, and those are better caught on save than discovered as a caption rendered off the edge.