keryx — video panels (deferred feature spec)¶
Status: deferred — Phase 5 (design only; not built until stills-based reels + posting are solid, 0001 §9) Owner: Matt Cockayne Last updated: 2026-06-15
0. How to read this¶
This is the detailed feature spec for video panels — using a short video
clip as a card's background instead of a still image. It is deferred: the
core specs (0001-keryx.md, 0002-interface-contracts.md) already leave a
"video-shaped hole" — the VideoProvider seam (0001 §3.4), the card
media {kind: image|video} schema (0001 §3.1), the renderer's fit-to-duration
path, and the editor's generate/upload-video affordance (R-UI-29, R-GEN-21).
This document fills in that hole so the design is captured now and can be built
later without reshaping anything. Where it overlaps the core specs, 0001/0002
win; this elaborates.
Requirement IDs here use R-VID-n.
1. Purpose & value¶
A run of static cards — even with generated stills — is still static. A short moving panel (a few seconds of motion behind the scrim + text) adds life and stops a reel reading as a slideshow, which matters on platforms that reward watch-time. Two sources, mirroring stills:
- Uploaded video — a pre-rendered clip the user already has (b-roll, a screen recording, a motion graphic). The cheapest win: no provider, no spend.
- Generated video — a short clip generated from the card's
sceneprompt via a video model (default Gemini Omni, swappable like every provider).
Both are per-card background panels, composited identically to image panels: full-bleed, bottom gradient scrim, the line over the top.
2. Scope & non-goals¶
In scope
- A card's overlay media may be a video (media.kind: video), uploaded or
generated, fit to the card's VO-driven duration.
- A pluggable VideoProvider (0001 §3.4) for generated clips.
- Renderer support for compositing video panels (scrim + text over video) and
normalising clips to 9:16 / 1080×1920 / the reel codec.
Out of scope (still keryx non-goals, 0001 §2)
- Not a video editor / timeline — no trimming UI, keyframes, transitions beyond
the existing card xfade, or multi-track editing. A panel is one clip behind one
card.
- The cover/bookend cards stay stills (the post's cover art); video is for
body (overlay) cards.
- No per-clip audio mixing — see §4.3 (clip audio is dropped; the reel's audio is
VO + music as today).
3. Phasing (within Phase 5)¶
- Uploaded video first.
keryx cards set <card> <clip>+ the editor upload picker, plus the renderer's fit-to-duration + compositing path. No provider needed — this is the cheap, high-value slice and de-risks the render work. - Generated video second. The
VideoProvideradapter (Gemini Omni), wired intokeryx cards --video, take management, and caching.
R-VID-1 (MUST) the two slices are independent: uploaded video ships and is
useful with no video provider configured.
4. Design¶
4.1 Card media model (recap, 0001 §3.1)¶
Anoverlay card resolves to either an image or a video; everything downstream
(scrim, text, timing) is media-kind-agnostic except the compositing primitive.
4.2 VideoProvider seam (recap, 0001 §3.4)¶
VideoProvider.Generate(ctx, VideoRequest) ([]Clip, error) — config key
providers.video, default Gemini Omni, off until this feature ships.
- R-VID-2 (MUST) generated clips use the reel theme's illustration style +
the hardened wordless instruction (text-leak applies to video too, and is
worse — moving mangled text); candidates are screened like image takes.
- R-VID-3 (MUST) the request is provider-neutral (scene prompt, target aspect,
target seconds); the adapter maps to its own API.
- R-VID-4 (SHOULD) default 1 take (video generation is materially more
expensive than stills) — --takes N opt-in; content-addressed cache applies
(R-GLOBAL-9).
4.3 Rendering (the real work)¶
R-VID-5(MUST) a video panel is fit to the card's VO-driven duration (0001 §3.1): loop if the clip is shorter, trim if longer. No speed-ramping in v1.R-VID-6(MUST) the scrim + text overlay composites over the video exactly as over a still; the xfade between cards still applies at clip boundaries.R-VID-7(MUST) the clip's own audio is dropped — reel audio stays VO + ducked music (§0001 §3.1); a video panel never introduces stray sound.R-VID-8(MUST) clips (uploaded or generated) are normalised to 1080×1920, the reel fps, and the reel's H.264/yuv420p profile before compositing; a non-9:16 upload is cover-fit (center-crop) with a warning.R-VID-9(SHOULD) the ffmpegRenderergains this path behind the existing interface — no API change to callers; a custom/remote renderer could implement it differently.
4.4 Cost, caching, performance¶
R-VID-10(MUST) generated-video spend is surfaced (it dwarfs stills); a--dry-run/preview path reports intent without spend (R-GLOBAL-5).R-VID-11(SHOULD) video render is slower than stills; long renders show progress and run as a background task in CI/studio.
4.5 Persistence — the size question (open)¶
Video assets are large, which tensions with git-first persistence. Resolved:
this is handled by the pluggable large-file persistence seam
(persistence.media.store, 0001 §3.5) — not a video-specific decision. A
video-heavy project flips its store to external (object store + committed
pointer; host-agnostic, no LFS server) or git-lfs, globally or per project,
with no code change; light projects stay on git.
R-VID-12 (MUST) video panels persist via the configured media store (§3.5);
keryx does not hardwire LFS. The rendered reel.mp4 persists the same way.
(Xet is a forward-looking adapter once usable off the HF Hub.)
5. Interfaces (recap + video-specific)¶
- CLI:
keryx cards --video [--card N] [--takes N](generate),keryx cards set <card> <clip>(upload),keryx cards select <card> <take>— all already in0001 §5/0002 §3.2(R-GEN-21,R-GEN-22). - Studio: the per-card media source (
R-UI-29) already offers Generate → short video and Upload (image or video); the v2 silent/ full preview (R-UI-8) renders the video panel. - MCP:
cards --videois an authoring/generate tool (exposed by default,R-MCP-2); it spends, so it honours the dry-run/confirm posture (R-MCP-3).
6. Testing¶
R-VID-13(MUST) the fit-to-duration maths (loop count / trim point vs VO duration) and the normalisation decision (aspect/codec) are pure and unit-tested without a real clip.R-VID-14(MUST) theVideoProvideris faked in unit/e2e tests (no real video gen, no spend); a tiny fixture clip exercises the real ffmpeg compositing path in integration only (env-gated, 0001 §8).R-VID-15(SHOULD) probe checks on a rendered video-panel reel: dimensions, duration, single audio stream (VO+music only — no clip audio leaked).
7. Open questions¶
Persistence policy for large video— resolved: the pluggable media store (persistence.media.store, 0001 §3.5). When a project adds video, switch toexternal— backendgitlab-packagesif hosted on GitLab (built-in, no infra), elses3(Cloudflare R2 recommended). No remaining open question here.- Generated clip length default (e.g. 3–5s, then loop) and whether to match it to the card's VO duration at generation time (costlier) or always loop a short clip.
- Provider/model — confirm the Gemini Omni (or successor) video model name, limits, and cost when this is picked up; it is config-selected so non-blocking.
- Mixed reels — a reel with some video and some still panels is expected and fine; confirm xfade between a video panel and a still reads cleanly.
8. References¶
- Core design + the video-shaped hole:
0001-keryx.md§3.1 (media), §3.4 (VideoProvider), §9 (Phase 5). - Contracts:
0002-interface-contracts.md§3.2 (R-GEN-21/22), §4.1 (R-UI-29).