Skip to content

keryx — avatars (the author as an actor in the reel)

Status: IMPLEMENTED (registry + composed prompt, first-use analysis, multi-avatar, named voice registry; follow-ups in §8) Owner: Matt Cockayne Last updated: 2026-06-17

0. How to read this

This specifies avatars — a recurring stylised character (the author, later co-authors) composited into reel panels as a first-hand observer, reacting to each beat and directed per panel like an actor. It realises the avatar-in-scene note in 0003-video-panels.md §8 and builds on the theme catalog (0001-keryx.md §6), cards gen (0002 §3.2), and the ImageProvider seam. Requirement IDs use R-AV-n.

Proven in the blog reel pipeline ("the avatar one is superb and my absolute favourite … having my avatar in it brings a whole different dimension").

Correction that shapes this design

The blog experiment hardcoded a risograph style into the avatar prompt — it forced one look rather than using the avatar's own style. That was a mistake, and it is what motivates the central rule here: likeness and style are separate. An avatar renders in its own captured style by default, and any theme can restyle it. The avatar is therefore not "a theme" — it is a character you can opt into any theme.

1. Purpose & value

House/style themes illustrate the topic with people-free scenes. An avatar is the deliberate exception: the same recognisable character appears in every panel reacting to the beat (recoiling at the serpent, slamming the padlock shut, throwing open the gate to a sunrise). It makes a run of reels unmistakably the author's and adds a narrative dimension a static illustration can't.

2. Model — likeness (who) × style (how) × direction (what)

An avatar panel prompt is composed by keryx from three independent parts:

  • Likeness (the who) — from the avatar's profile; theme-independent ("a man with short brown hair and a full reddish-brown beard…"). The reference image conditions it (image-to-image); the likeness clause reinforces it.
  • Style (the how) — resolved, not assumed:
  • the avatar's own captured style by default (so --avatar matt alone renders Matt in his native look — the risograph fix), or
  • the active theme's card.style when opted into a theme (--avatar matt --theme neon restyles Matt into neon synthwave).
  • Direction (the what) — the card's scene: the per-panel acting (§3).

Plus a standard postamble (expression-lock + no-other-people + no-text + composition + 9:16) — style-independent, a keryx default, overridable.

Using the person in the reference image, keeping their likeness (<LIKENESS>),
rendered in <STYLE>: draw them as a full figure in this scene: <SCENE>.
<POSTAMBLE>

Generation is image-to-image (the reference on ImageRequest.Refs); no new command — cards gen switches to this path when an avatar is opted in. Key risk: likeness-vs-restyle tension — restyling hard into a distant theme can erode the likeness; the reference holds likeness while the prompt overrides style, and the balance needs a per-theme taste pass.

3. Directing the avatar like an actor

The avatar defaults to a friendly expression (that is how the source reads), so every panel's scene must direct it explicitly, like blocking an actor:

bodily placement / posethe prop/scene element reacted togazean explicit named emotion

Verbatim examples (the "rung" reel):

  • standing high on a tall ladder, looking down dismayed and alarmed at the bottom rung which has been sawn clean away
  • holding up a large price tag, his expression cynical and wry, not smiling
  • confidently throwing open a tall gate toward a warm amber sunrise, a warm confident smile on his face ← smile only because the beat is positive

Rules: name the emotion every time; reserve the smile for positive beats; use negatives (not smiling) to override the default; the postamble's "match the expression precisely" reinforces it. Second failure mode is style drift on metaphor scenes (a sapling pulling toward woodcut/landscape) — counter with same-style anchoring + anti-drift negatives + more takes.

4. First-use avatar analysis (auto-profile)

When an avatar is registered, keryx runs a one-off vision analysis (the image provider's describe capability) on the reference to auto-capture its profile: the likeness clause, the avatar's own base style, and its palette. That profile feeds §2's likeness and native-style clauses, so the user never hand-writes a preamble. The captured style is what makes "render in its own style" correct (vs the hardcoded risograph). Analysis is one call at registration; re-runnable on demand. Needs a provider describe/analyze capability (a new ImageProvider.Describe or small VisionProvider).

keryx avatar add <name> <image> registers + analyses; keryx avatar list/show inspects; the profile is persisted to config and is user-editable (analysis is a starting point, not gospel).

5. Configuration — a named avatar registry

Avatars live in their own top-level registry (not inside one theme), so they can be opted into any theme and so multiple authors can coexist:

avatars:
  matt:
    ref: assets/images/avatar-mc.webp        # project-relative (keryx holds no asset)
    likeness: "a man with short brown hair and a full reddish-brown beard, fair skin"
    style: "flat screen-print / risograph illustration, bold simplified shapes, clean linework"
    palette: "deep petrol-teal, warm amber-orange, soft cream, dark charcoal"
  # postamble defaults are keryx-provided; override per avatar if needed.

Opt in per run: cards gen --avatar matt [--theme <kw>] (and the same on reel build). With no --theme, the avatar's own style is used; with a theme, the theme's card.style restyles it. A configured default avatar MAY apply without the flag.

6. Multiple avatars & voices (multi-author)

As co-authors appear:

  • Multiple avatars per panel. A card may direct 2+ registered avatars (--avatar matt,jane or a per-card list), each with its own per-figure direction ("matt recoils while jane points"). ImageRequest.Refs already carries multiple references; the risk is likeness-bleed between two refs.
  • Multiple voice clones. The reel voice becomes a named voice registry (parallel to avatars); a storyboard line selects its speaker (extends the per-card voice override with a speaker/voice name). Independent of avatars but the natural companion to multi-author reels.

7. Requirements (R-AV-*)

  • R-AV-1 (MUST) an avatar opted into a run generates each panel image-to-image, attaching the avatar reference; a missing reference is a clear error before any provider call. keryx holds no asset (ref is project-relative).
  • R-AV-2 (MUST) the panel prompt is composed: likeness (avatar) + style (resolved per R-AV-3) + the card scene (direction) + the standard postamble. Style is never hardcoded into the likeness.
  • R-AV-3 (MUST) style resolution: the avatar's own captured style when no theme is selected; the active theme's card.style when a theme is selected.
  • R-AV-4 (MUST) the card scene is the per-panel actor direction; docs + seeded profiles state the explicit-emotion / default-smile discipline (§3).
  • R-AV-5 (MUST) avatars are a registry opted into any reel theme — not a theme type; composes the catalog, renderer, takes, and ImageProvider additively.
  • R-AV-6 (MUST) first-use analysis auto-captures likeness + native style + palette into the registry (re-runnable; user-editable). Needs a provider describe capability.
  • R-AV-7 (SHOULD) multiple avatars per panel with per-figure direction (Refs already multi); likeness-bleed is the known risk.
  • R-AV-8 (SHOULD) named voice registry with per-line speaker selection (multi-author narration).
  • R-AV-9 (MAY) outfit/angle locking (extra reference angles) for stronger consistency.

8. Follow-ups (separate work)

  1. Text-leak OCR gate — OCR each take, auto-reject + re-roll glyphs (all image themes).
  2. Per-theme caption styling in the renderer (neon glow, marker lettering …).

9. Open questions

  1. With --avatar and no --theme, confirm the default is the avatar's own style (this spec's position) vs the configured default reel theme's style.
  2. Is a dedicated per-card emotion field worth it over folding emotion into scene? (Current call: fold into scene.)
  3. Multi-avatar per-figure direction syntax — a structured per-card avatar list vs naming figures inside the scene string.

10. References

  • 0001-keryx.md §3.1 (cards/overlay), §3.4 (ImageProvider), §6 (themes).
  • 0002-interface-contracts.md §3.2 (R-GEN-12..16).
  • 0003-video-panels.md §8 (the avatar-in-scene note this realises).