Skip to content

keryx — video panels (deferred feature spec)

Status: deferred — Phase 5 (design only; not built until stills-based reels + posting are solid, 0001 §9) Owner: Matt Cockayne Last updated: 2026-06-15

0. How to read this

This is the detailed feature spec for video panels — using a short video clip as a card's background instead of a still image. It is deferred: the core specs (0001-keryx.md, 0002-interface-contracts.md) already leave a "video-shaped hole" — the VideoProvider seam (0001 §3.4), the card media {kind: image|video} schema (0001 §3.1), the renderer's fit-to-duration path, and the editor's generate/upload-video affordance (R-UI-29, R-GEN-21). This document fills in that hole so the design is captured now and can be built later without reshaping anything. Where it overlaps the core specs, 0001/0002 win; this elaborates.

Requirement IDs here use R-VID-n.

1. Purpose & value

A run of static cards — even with generated stills — is still static. A short moving panel (a few seconds of motion behind the scrim + text) adds life and stops a reel reading as a slideshow, which matters on platforms that reward watch-time. Two sources, mirroring stills:

  • Uploaded video — a pre-rendered clip the user already has (b-roll, a screen recording, a motion graphic). The cheapest win: no provider, no spend.
  • Generated video — a short clip generated from the card's scene prompt via a video model (default Gemini Omni, swappable like every provider).

Both are per-card background panels, composited identically to image panels: full-bleed, bottom gradient scrim, the line over the top.

2. Scope & non-goals

In scope - A card's overlay media may be a video (media.kind: video), uploaded or generated, fit to the card's VO-driven duration. - A pluggable VideoProvider (0001 §3.4) for generated clips. - Renderer support for compositing video panels (scrim + text over video) and normalising clips to 9:16 / 1080×1920 / the reel codec.

Out of scope (still keryx non-goals, 0001 §2) - Not a video editor / timeline — no trimming UI, keyframes, transitions beyond the existing card xfade, or multi-track editing. A panel is one clip behind one card. - The cover/bookend cards stay stills (the post's cover art); video is for body (overlay) cards. - No per-clip audio mixing — see §4.3 (clip audio is dropped; the reel's audio is VO + music as today).

3. Phasing (within Phase 5)

  1. Uploaded video first. keryx cards set <card> <clip> + the editor upload picker, plus the renderer's fit-to-duration + compositing path. No provider needed — this is the cheap, high-value slice and de-risks the render work.
  2. Generated video second. The VideoProvider adapter (Gemini Omni), wired into keryx cards --video, take management, and caching.

R-VID-1 (MUST) the two slices are independent: uploaded video ships and is useful with no video provider configured.

4. Design

4.1 Card media model (recap, 0001 §3.1)

"media": { "kind": "video", "source": "uploaded|generated", "path": "cards/03.mp4" }
An overlay card resolves to either an image or a video; everything downstream (scrim, text, timing) is media-kind-agnostic except the compositing primitive.

4.2 VideoProvider seam (recap, 0001 §3.4)

VideoProvider.Generate(ctx, VideoRequest) ([]Clip, error) — config key providers.video, default Gemini Omni, off until this feature ships. - R-VID-2 (MUST) generated clips use the reel theme's illustration style + the hardened wordless instruction (text-leak applies to video too, and is worse — moving mangled text); candidates are screened like image takes. - R-VID-3 (MUST) the request is provider-neutral (scene prompt, target aspect, target seconds); the adapter maps to its own API. - R-VID-4 (SHOULD) default 1 take (video generation is materially more expensive than stills) — --takes N opt-in; content-addressed cache applies (R-GLOBAL-9).

4.3 Rendering (the real work)

  • R-VID-5 (MUST) a video panel is fit to the card's VO-driven duration (0001 §3.1): loop if the clip is shorter, trim if longer. No speed-ramping in v1.
  • R-VID-6 (MUST) the scrim + text overlay composites over the video exactly as over a still; the xfade between cards still applies at clip boundaries.
  • R-VID-7 (MUST) the clip's own audio is dropped — reel audio stays VO + ducked music (§0001 §3.1); a video panel never introduces stray sound.
  • R-VID-8 (MUST) clips (uploaded or generated) are normalised to 1080×1920, the reel fps, and the reel's H.264/yuv420p profile before compositing; a non-9:16 upload is cover-fit (center-crop) with a warning.
  • R-VID-9 (SHOULD) the ffmpeg Renderer gains this path behind the existing interface — no API change to callers; a custom/remote renderer could implement it differently.

4.4 Cost, caching, performance

  • R-VID-10 (MUST) generated-video spend is surfaced (it dwarfs stills); a --dry-run/preview path reports intent without spend (R-GLOBAL-5).
  • R-VID-11 (SHOULD) video render is slower than stills; long renders show progress and run as a background task in CI/studio.

4.5 Persistence — the size question (open)

Video assets are large, which tensions with git-first persistence. Resolved: this is handled by the pluggable large-file persistence seam (persistence.media.store, 0001 §3.5) — not a video-specific decision. A video-heavy project flips its store to external (object store + committed pointer; host-agnostic, no LFS server) or git-lfs, globally or per project, with no code change; light projects stay on git. R-VID-12 (MUST) video panels persist via the configured media store (§3.5); keryx does not hardwire LFS. The rendered reel.mp4 persists the same way. (Xet is a forward-looking adapter once usable off the HF Hub.)

5. Interfaces (recap + video-specific)

  • CLI: keryx cards --video [--card N] [--takes N] (generate), keryx cards set <card> <clip> (upload), keryx cards select <card> <take> — all already in 0001 §5 / 0002 §3.2 (R-GEN-21, R-GEN-22).
  • Studio: the per-card media source (R-UI-29) already offers Generate → short video and Upload (image or video); the v2 silent/ full preview (R-UI-8) renders the video panel.
  • MCP: cards --video is an authoring/generate tool (exposed by default, R-MCP-2); it spends, so it honours the dry-run/confirm posture (R-MCP-3).

6. Testing

  • R-VID-13 (MUST) the fit-to-duration maths (loop count / trim point vs VO duration) and the normalisation decision (aspect/codec) are pure and unit-tested without a real clip.
  • R-VID-14 (MUST) the VideoProvider is faked in unit/e2e tests (no real video gen, no spend); a tiny fixture clip exercises the real ffmpeg compositing path in integration only (env-gated, 0001 §8).
  • R-VID-15 (SHOULD) probe checks on a rendered video-panel reel: dimensions, duration, single audio stream (VO+music only — no clip audio leaked).

7. Open questions

  1. Persistence policy for large videoresolved: the pluggable media store (persistence.media.store, 0001 §3.5). When a project adds video, switch to external — backend gitlab-packages if hosted on GitLab (built-in, no infra), else s3 (Cloudflare R2 recommended). No remaining open question here.
  2. Generated clip length default (e.g. 3–5s, then loop) and whether to match it to the card's VO duration at generation time (costlier) or always loop a short clip.
  3. Provider/model — confirm the Gemini Omni (or successor) video model name, limits, and cost when this is picked up; it is config-selected so non-blocking.
  4. Mixed reels — a reel with some video and some still panels is expected and fine; confirm xfade between a video panel and a still reads cleanly.

8. References

  • Core design + the video-shaped hole: 0001-keryx.md §3.1 (media), §3.4 (VideoProvider), §9 (Phase 5).
  • Contracts: 0002-interface-contracts.md §3.2 (R-GEN-21/22), §4.1 (R-UI-29).