keryx — avatars (the author as an actor in the reel)¶
Status: IMPLEMENTED (registry + composed prompt, first-use analysis, multi-avatar, named voice registry; follow-ups in §8) Owner: Matt Cockayne Last updated: 2026-06-17
0. How to read this¶
This specifies avatars — a recurring stylised character (the author, later
co-authors) composited into reel panels as a first-hand observer, reacting to
each beat and directed per panel like an actor. It realises the avatar-in-scene
note in 0003-video-panels.md §8 and builds on the theme catalog
(0001-keryx.md §6), cards gen (0002 §3.2), and the ImageProvider seam.
Requirement IDs use R-AV-n.
Proven in the blog reel pipeline ("the avatar one is superb and my absolute favourite … having my avatar in it brings a whole different dimension").
Correction that shapes this design¶
The blog experiment hardcoded a risograph style into the avatar prompt — it forced one look rather than using the avatar's own style. That was a mistake, and it is what motivates the central rule here: likeness and style are separate. An avatar renders in its own captured style by default, and any theme can restyle it. The avatar is therefore not "a theme" — it is a character you can opt into any theme.
1. Purpose & value¶
House/style themes illustrate the topic with people-free scenes. An avatar is the deliberate exception: the same recognisable character appears in every panel reacting to the beat (recoiling at the serpent, slamming the padlock shut, throwing open the gate to a sunrise). It makes a run of reels unmistakably the author's and adds a narrative dimension a static illustration can't.
2. Model — likeness (who) × style (how) × direction (what)¶
An avatar panel prompt is composed by keryx from three independent parts:
- Likeness (the who) — from the avatar's profile; theme-independent ("a man with short brown hair and a full reddish-brown beard…"). The reference image conditions it (image-to-image); the likeness clause reinforces it.
- Style (the how) — resolved, not assumed:
- the avatar's own captured style by default (so
--avatar mattalone renders Matt in his native look — the risograph fix), or - the active theme's
card.stylewhen opted into a theme (--avatar matt --theme neonrestyles Matt into neon synthwave). - Direction (the what) — the card's
scene: the per-panel acting (§3).
Plus a standard postamble (expression-lock + no-other-people + no-text + composition + 9:16) — style-independent, a keryx default, overridable.
Using the person in the reference image, keeping their likeness (<LIKENESS>),
rendered in <STYLE>: draw them as a full figure in this scene: <SCENE>.
<POSTAMBLE>
Generation is image-to-image (the reference on ImageRequest.Refs); no new
command — cards gen switches to this path when an avatar is opted in. Key
risk: likeness-vs-restyle tension — restyling hard into a distant theme can
erode the likeness; the reference holds likeness while the prompt overrides
style, and the balance needs a per-theme taste pass.
3. Directing the avatar like an actor¶
The avatar defaults to a friendly expression (that is how the source reads),
so every panel's scene must direct it explicitly, like blocking an actor:
bodily placement / pose → the prop/scene element reacted to → gaze → an explicit named emotion
Verbatim examples (the "rung" reel):
standing high on a tall ladder, looking down dismayed and alarmed at the bottom rung which has been sawn clean awayholding up a large price tag, his expression cynical and wry, not smilingconfidently throwing open a tall gate toward a warm amber sunrise, a warm confident smile on his face← smile only because the beat is positive
Rules: name the emotion every time; reserve the smile for positive beats; use
negatives (not smiling) to override the default; the postamble's "match the
expression precisely" reinforces it. Second failure mode is style drift on
metaphor scenes (a sapling pulling toward woodcut/landscape) — counter with
same-style anchoring + anti-drift negatives + more takes.
4. First-use avatar analysis (auto-profile)¶
When an avatar is registered, keryx runs a one-off vision analysis (the image
provider's describe capability) on the reference to auto-capture its profile:
the likeness clause, the avatar's own base style, and its palette.
That profile feeds §2's likeness and native-style clauses, so the user never
hand-writes a preamble. The captured style is what makes "render in its own
style" correct (vs the hardcoded risograph). Analysis is one call at registration;
re-runnable on demand. Needs a provider describe/analyze capability (a new
ImageProvider.Describe or small VisionProvider).
keryx avatar add <name> <image> registers + analyses; keryx avatar list/show
inspects; the profile is persisted to config and is user-editable (analysis is a
starting point, not gospel).
5. Configuration — a named avatar registry¶
Avatars live in their own top-level registry (not inside one theme), so they can be opted into any theme and so multiple authors can coexist:
avatars:
matt:
ref: assets/images/avatar-mc.webp # project-relative (keryx holds no asset)
likeness: "a man with short brown hair and a full reddish-brown beard, fair skin"
style: "flat screen-print / risograph illustration, bold simplified shapes, clean linework"
palette: "deep petrol-teal, warm amber-orange, soft cream, dark charcoal"
# postamble defaults are keryx-provided; override per avatar if needed.
Opt in per run: cards gen --avatar matt [--theme <kw>] (and the same on
reel build). With no --theme, the avatar's own style is used; with a theme,
the theme's card.style restyles it. A configured default avatar MAY apply
without the flag.
6. Multiple avatars & voices (multi-author)¶
As co-authors appear:
- Multiple avatars per panel. A card may direct 2+ registered avatars
(
--avatar matt,janeor a per-card list), each with its own per-figure direction ("matt recoils while jane points").ImageRequest.Refsalready carries multiple references; the risk is likeness-bleed between two refs. - Multiple voice clones. The reel
voicebecomes a named voice registry (parallel to avatars); a storyboard line selects its speaker (extends the per-cardvoiceoverride with aspeaker/voice name). Independent of avatars but the natural companion to multi-author reels.
7. Requirements (R-AV-*)¶
R-AV-1(MUST) an avatar opted into a run generates each panel image-to-image, attaching the avatar reference; a missing reference is a clear error before any provider call. keryx holds no asset (refis project-relative).R-AV-2(MUST) the panel prompt is composed: likeness (avatar) + style (resolved per R-AV-3) + the cardscene(direction) + the standard postamble. Style is never hardcoded into the likeness.R-AV-3(MUST) style resolution: the avatar's own capturedstylewhen no theme is selected; the active theme'scard.stylewhen a theme is selected.R-AV-4(MUST) the cardsceneis the per-panel actor direction; docs + seeded profiles state the explicit-emotion / default-smile discipline (§3).R-AV-5(MUST) avatars are a registry opted into any reel theme — not a theme type; composes the catalog, renderer, takes, andImageProvideradditively.R-AV-6(MUST) first-use analysis auto-captures likeness + native style + palette into the registry (re-runnable; user-editable). Needs a provider describe capability.R-AV-7(SHOULD) multiple avatars per panel with per-figure direction (Refsalready multi); likeness-bleed is the known risk.R-AV-8(SHOULD) named voice registry with per-line speaker selection (multi-author narration).R-AV-9(MAY) outfit/angle locking (extra reference angles) for stronger consistency.
8. Follow-ups (separate work)¶
- Text-leak OCR gate — OCR each take, auto-reject + re-roll glyphs (all image themes).
- Per-theme caption styling in the renderer (neon glow, marker lettering …).
9. Open questions¶
- With
--avatarand no--theme, confirm the default is the avatar's own style (this spec's position) vs the configured default reel theme's style. - Is a dedicated per-card
emotionfield worth it over folding emotion intoscene? (Current call: fold intoscene.) - Multi-avatar per-figure direction syntax — a structured per-card avatar list
vs naming figures inside the
scenestring.
10. References¶
0001-keryx.md§3.1 (cards/overlay), §3.4 (ImageProvider), §6 (themes).0002-interface-contracts.md§3.2 (R-GEN-12..16).0003-video-panels.md§8 (the avatar-in-scene note this realises).