0013 — studio generation: per-card media + cover + portrait (R-UI-29/3)¶
Status: IMPLEMENTED (task #56 — the last Phase-1 studio MUST. Card, cover, and
portrait media generate/upload wired into the studio over the shared
gencmd.GenerateInto core + async job model. §7 decisions resolved 2026-06-23 — see
"Resolved decisions".)
Date: 2026-06-23
Depends on: the editor (0011 §4), the assets/upload handler (uploadAsset), the
storyboard Card.Media model (internal/reel), the generation cores
(internal/gencmd, internal/gen/gemini, pkg/provider), the config-driven
provider seam (0001 §3.4). Couples to: the deferred R-UI-22 theme re-roll flag.
1. Goal & scope¶
Give the studio editor the per-card media workflow (R-UI-29, MUST) and image
upload (R-UI-3, MUST): for an overlay card, Generate (AI) or Upload a
pre-rendered image/video; browse generated candidate takes and pick one;
generate/upload the cover; upload portrait reference photos (and generate a
portrait). Each card shows its media source/kind (still vs video · generated vs
uploaded).
This is glue, not new generation. The CLI already generates: gencmd.CardTakes,
the cover core, the portrait/avatar core, all over the pkg/provider image seam
(Gemini default, config-driven). The studio reuses those — no new provider code.
In scope: the studio API + editor UI for generate/upload/takes/pick over the existing
cores; the provider-unconfigured / missing-key UX; cost-gating. Out of scope: new
generators, the video provider beyond accepting uploads (R-GEN-21 stays CLI-led),
the render/build step.
2. What already exists (reuse map)¶
| Need | Existing seam | Notes |
|---|---|---|
| Generate card illustration takes | gencmd.CardTakes(ctx, p, slug, card, n, theme, avatar) |
N candidates → cards/takes/; sync; one card (--card M) or all overlay |
| Image provider | provider.ImageRequest{Prompt, Aspect, Count, Refs, Model} → []Image; gemini.Image.Generate |
config providers.image; Refs = image-to-image (portrait) |
| Cover art | the cover core (pkg/cmd/cover → gencmd) |
article theme; aspect from theme |
| Portrait avatar | the portrait/avatar core (pkg/cmd/portrait) |
--ref photos → image-to-image; separate avatar registry |
| Card media model | reel.Card.Media{Kind: image|video, Source: generated|uploaded} + Scene |
overlay needs scene unless Media set (validate.go) |
| Takes storage | cards/takes/ (gitignored, disposable) + sheet/pick |
the "pick" sets the slot assembly consumes (R-GEN-15/16) |
| Upload | uploadAsset (multipart kind=…) |
today cover → cover.png; extend to per-card + portrait |
3. The studio surface (proposed)¶
All generate endpoints are async (§6): they return 202 {job_id, cost_estimate}
and the caller polls GET …/jobs/{id}.
Per-card media (overlay cards):
- POST /api/v1/workspace/{slug}/cards/{n}/generate {takes?, theme?} — start a job
to generate N candidate takes for card n (reuses CardTakes; N defaults to the
project's configured take count, §7-D3). → 202 {job_id, cost_estimate}.
- GET /api/v1/workspace/{slug}/jobs/{id} — poll job
{state, cost_estimate, cost_actual, eta, takes[]} (§6).
- GET /api/v1/workspace/{slug}/cards/{n}/takes — list candidate takes (filenames +
a thumb/preview URL) for the pick UI (reads tier-1 files, §6.1).
- POST /api/v1/workspace/{slug}/cards/{n}/pick {take} — select a take → sets the
card's Media{kind:image, source:generated, theme}; persists to storyboard.json.
- POST /api/v1/workspace/{slug}/cards/{n}/media (multipart) — upload an image/
video for card n; used as-is → Media{kind, source:uploaded}, no scene needed.
- GET /api/v1/workspace/{slug}/takes/{file} — serve a take/asset image bytes (the
editor previews; same-origin, localhost).
Cover + portrait (R-UI-3):
- POST /api/v1/workspace/{slug}/cover/generate — start a cover job (cover core);
upload already works via uploadAsset kind=cover. → 202 {job_id, cost_estimate}.
- POST /api/v1/workspace/{slug}/portrait (multipart, repeatable refs) — upload
portrait reference photo(s); POST …/portrait/generate starts the portrait job.
Editor UI: an overlay card's panel gains a media block — Generate (with a take count) / Upload — a takes gallery (thumbnails, pick), and a source/kind badge (still·video / generated·uploaded). The associated panel gains cover generate + portrait refs. Generation shows a spinner + cost cue and a clean "provider not configured / set GEMINI_API_KEY" state (mirrors chat's 503).
4. The generation seam in the studio¶
A narrow injected Generator seam (like Chatter/GitOps) so the studio tests
fake it (no provider, no spend, no ffmpeg). Because generation is async (§6),
the seam starts work and the job store carries progress/cost/result — the
seam does not block returning takes:
// Generator starts a unit of generation; the job store (§6) holds its progress,
// cost, and resulting takes. Implementations must NOT block on the provider.
type Generator interface {
// EstimateCost returns the up-front cost cue for a request without spending
// (provider + model + count → money). Surfaced before the user commits.
EstimateCost(req GenRequest) (Cost, error)
// Start kicks off generation in the background and writes progress/result into
// the job identified by jobID. Returns once the job is registered (not done).
Start(ctx context.Context, jobID string, req GenRequest) error
}
// GenRequest is provider-neutral: card takes, cover, or portrait, carrying intent.
type GenRequest struct {
Slug string
Kind GenKind // cardTakes | cover | portrait
Card int // cardTakes only
Takes int // cardTakes count (from project config, §7-D3)
Theme string // recorded onto generated Media (§7-D5)
Refs []string // portrait reference images (image-to-image)
}
Live impl wraps gencmd (holds props for the provider/config) and runs the sync
core on a goroutine, streaming progress into the job. Nil Generator → a 503
"generation unavailable (no image provider configured)". Side-effecting + paid.
5. Secrets, cost, MCP¶
- Provider config + key: the studio builds the Gemini client from props
(
GEMINI_API_KEYvia env/keychain), exactly as the CLI does. Unconfigured → 503 with an actionable reason; never logs the key. - Cost gate: generation is an explicit user action (a button, with the take
count and an estimated cost shown) — never auto-fires.
R-GLOBAL-9cache: unchanged prompt+settings → cache hit, no re-spend (the cores already cache). - Cost is surfaced everywhere, not gated (§7-D2). Generation stays on the MCP
surface — driving image generation from an assistant is a core keryx use case
(the blog workflow is Claude→Imagen). The control for the paid-but-reversible class
is disclosure, not exclusion: every generating command/tool surfaces, in its
description and its response, that it spends money and the estimated/actual
cost. A shared cost-estimator (provider+model+count →
Cost) gives one cost model read by CLI, studio UI, and MCP — so all three agree. - Only
post/approve/authstay MCP-gated — those are outward-facing and irreversible (publish to a public platform / mint credentials). Generation is paid but private and reversible (disposable takes, human still picks) — a different risk class, controlled by cost-visibility.
6. Async model — job + poll, cost on the job¶
Generation is slow (seconds) and paid, unlike every studio call so far. It runs as an async job, polled (not SSE — gen yields files + a cost, not a token stream; not synchronous — batches benefit from non-blocking, and a closed tab must not lose progress):
POST …/generatevalidates + estimates cost, registers a job, returns202 {job_id, cost_estimate}, and starts theGeneratorin the background.GET …/jobs/{id}polls{state, cost_estimate, cost_actual, eta, takes[]}(state:queued → running → done | failed). The cost and timing live on the job resource — that is what surfaces "3 of 8 done, ~£0.12 spent, ~20s left" to the CLI, the editor, and the MCP caller, from one place.- A batch (generate across cards) is one job with per-card sub-items, so the poll naturally reports aggregate progress + accruing cost.
- This survives a refresh / closed tab: the job runs server-side; the UI re-attaches by id. (A WebSocket buys nothing here — one local user, coarse events; the job resource is the durable thing. If push is ever wanted, it layers over the same job state without changing it.)
6.1 Three-tier persistence¶
Each tier owns a different durability guarantee; the lower tiers are caches over the workspace files, never authoritative:
- Workspace files — the truth.
cards/takes/*.pngon disk + the pick recorded instoryboard.json(git-tracked, commit-on-save). Survives refresh, server restart, reboot. Neither store below holds authoritative data. - Backend in-memory job store — live state. A mutex-guarded
map[jobID]*Jobholding{state, cost_estimate, cost_actual, eta, takes[]}. Makes poll/re-attach work; survives refresh, not server restart — fine, because any completed gen already wrote tier 1, so a restart only loses the live progress bar. A poll for a forgotten id → 404 → "ask the files" (not an error). - Frontend localStorage — what to watch. Per project: the in-flight job ids
- light UI prefs (active card, take-count input). On load the UI re-polls those ids and reconciles against tier 2 (404 → drop the stale tracker, surface "that generation ended while you were away — check the takes"). Never stores takes/picks.
Reconciliation rule: localStorage says what to look for, the backend says what's live, the workspace says what's real.
7. Resolved decisions (2026-06-23)¶
- D1 — Async model. Job + poll; cost/eta/state on the job resource; three-tier persistence (§6/§6.1). Not SSE, not synchronous.
- D2 — MCP. Generation stays on MCP (core AI use case). Control is mandatory
cost+risk disclosure on every paid command/tool via a shared cost-estimator;
not exclusion. Only
post/approve/authstay gated (§5). - D3 — Takes UX. Generate-N → gallery → pick. N is configured per project (workspace/config setting, default 4), not asked per call — N=1 gives single-re-roll behaviour.
- D4 — Phasing. All three (per-card, cover, portrait) ship in one MR, built
in internal order per-card → cover → portrait (they share the job/cost/
Generatorinfra), with coherent per-area commits. - D5 — Theme re-roll flag. Record the originating theme on generated
Mediaand wire the R-UI-22 re-roll flag in this MR (workspace theme ≠ card media theme → editor shows "made under the old theme — re-roll?"). Closes R-UI-22's deferred piece. - D6 — Testing. Fake
Generator(no spend) for the studio API + a godog scenario (generate-N → pick via the fake; upload used-as-is; cost surfaced on the job). The real provider stays env-gated integration (INT_TEST=1); godog never hits a paid API. - D7 — In-memory projects. Allow generation (it's the AI loop someone with no local disk most wants), surface a RAM-usage note, and prune unpicked takes by a grace TTL — not on pick (default 3 min, configurable). A periodic sweep frees only candidates older than the window that aren't the picked slot, so "generate round 2, go back and pick from round 1" works within the window while RAM stays bounded. Local-disk projects don't prune (takes are cheap files; workspace cleanup owns them).
8. Definition of done (per phase)¶
Failing test → code → green just ci; a godog scenario for the workflow (faked
generator); the editor media block + takes gallery; docs page updated
(docs/components/studio); /simplify + /code-review. The provider integration
path stays env-gated (INT_TEST=1), not run in CI.