Skip to content

0013 — studio generation: per-card media + cover + portrait (R-UI-29/3)

Status: IMPLEMENTED (task #56 — the last Phase-1 studio MUST. Card, cover, and portrait media generate/upload wired into the studio over the shared gencmd.GenerateInto core + async job model. §7 decisions resolved 2026-06-23 — see "Resolved decisions".) Date: 2026-06-23 Depends on: the editor (0011 §4), the assets/upload handler (uploadAsset), the storyboard Card.Media model (internal/reel), the generation cores (internal/gencmd, internal/gen/gemini, pkg/provider), the config-driven provider seam (0001 §3.4). Couples to: the deferred R-UI-22 theme re-roll flag.

1. Goal & scope

Give the studio editor the per-card media workflow (R-UI-29, MUST) and image upload (R-UI-3, MUST): for an overlay card, Generate (AI) or Upload a pre-rendered image/video; browse generated candidate takes and pick one; generate/upload the cover; upload portrait reference photos (and generate a portrait). Each card shows its media source/kind (still vs video · generated vs uploaded).

This is glue, not new generation. The CLI already generates: gencmd.CardTakes, the cover core, the portrait/avatar core, all over the pkg/provider image seam (Gemini default, config-driven). The studio reuses those — no new provider code.

In scope: the studio API + editor UI for generate/upload/takes/pick over the existing cores; the provider-unconfigured / missing-key UX; cost-gating. Out of scope: new generators, the video provider beyond accepting uploads (R-GEN-21 stays CLI-led), the render/build step.

2. What already exists (reuse map)

Need Existing seam Notes
Generate card illustration takes gencmd.CardTakes(ctx, p, slug, card, n, theme, avatar) N candidates → cards/takes/; sync; one card (--card M) or all overlay
Image provider provider.ImageRequest{Prompt, Aspect, Count, Refs, Model}[]Image; gemini.Image.Generate config providers.image; Refs = image-to-image (portrait)
Cover art the cover core (pkg/cmd/cover → gencmd) article theme; aspect from theme
Portrait avatar the portrait/avatar core (pkg/cmd/portrait) --ref photos → image-to-image; separate avatar registry
Card media model reel.Card.Media{Kind: image|video, Source: generated|uploaded} + Scene overlay needs scene unless Media set (validate.go)
Takes storage cards/takes/ (gitignored, disposable) + sheet/pick the "pick" sets the slot assembly consumes (R-GEN-15/16)
Upload uploadAsset (multipart kind=…) today covercover.png; extend to per-card + portrait

3. The studio surface (proposed)

All generate endpoints are async (§6): they return 202 {job_id, cost_estimate} and the caller polls GET …/jobs/{id}.

Per-card media (overlay cards): - POST /api/v1/workspace/{slug}/cards/{n}/generate {takes?, theme?} — start a job to generate N candidate takes for card n (reuses CardTakes; N defaults to the project's configured take count, §7-D3). → 202 {job_id, cost_estimate}. - GET /api/v1/workspace/{slug}/jobs/{id} — poll job {state, cost_estimate, cost_actual, eta, takes[]} (§6). - GET /api/v1/workspace/{slug}/cards/{n}/takes — list candidate takes (filenames + a thumb/preview URL) for the pick UI (reads tier-1 files, §6.1). - POST /api/v1/workspace/{slug}/cards/{n}/pick {take} — select a take → sets the card's Media{kind:image, source:generated, theme}; persists to storyboard.json. - POST /api/v1/workspace/{slug}/cards/{n}/media (multipart) — upload an image/ video for card n; used as-is → Media{kind, source:uploaded}, no scene needed. - GET /api/v1/workspace/{slug}/takes/{file} — serve a take/asset image bytes (the editor previews; same-origin, localhost).

Cover + portrait (R-UI-3): - POST /api/v1/workspace/{slug}/cover/generate — start a cover job (cover core); upload already works via uploadAsset kind=cover. → 202 {job_id, cost_estimate}. - POST /api/v1/workspace/{slug}/portrait (multipart, repeatable refs) — upload portrait reference photo(s); POST …/portrait/generate starts the portrait job.

Editor UI: an overlay card's panel gains a media blockGenerate (with a take count) / Upload — a takes gallery (thumbnails, pick), and a source/kind badge (still·video / generated·uploaded). The associated panel gains cover generate + portrait refs. Generation shows a spinner + cost cue and a clean "provider not configured / set GEMINI_API_KEY" state (mirrors chat's 503).

4. The generation seam in the studio

A narrow injected Generator seam (like Chatter/GitOps) so the studio tests fake it (no provider, no spend, no ffmpeg). Because generation is async (§6), the seam starts work and the job store carries progress/cost/result — the seam does not block returning takes:

// Generator starts a unit of generation; the job store (§6) holds its progress,
// cost, and resulting takes. Implementations must NOT block on the provider.
type Generator interface {
    // EstimateCost returns the up-front cost cue for a request without spending
    // (provider + model + count → money). Surfaced before the user commits.
    EstimateCost(req GenRequest) (Cost, error)
    // Start kicks off generation in the background and writes progress/result into
    // the job identified by jobID. Returns once the job is registered (not done).
    Start(ctx context.Context, jobID string, req GenRequest) error
}

// GenRequest is provider-neutral: card takes, cover, or portrait, carrying intent.
type GenRequest struct {
    Slug  string
    Kind  GenKind // cardTakes | cover | portrait
    Card  int     // cardTakes only
    Takes int     // cardTakes count (from project config, §7-D3)
    Theme string  // recorded onto generated Media (§7-D5)
    Refs  []string // portrait reference images (image-to-image)
}

Live impl wraps gencmd (holds props for the provider/config) and runs the sync core on a goroutine, streaming progress into the job. Nil Generator → a 503 "generation unavailable (no image provider configured)". Side-effecting + paid.

5. Secrets, cost, MCP

  • Provider config + key: the studio builds the Gemini client from props (GEMINI_API_KEY via env/keychain), exactly as the CLI does. Unconfigured → 503 with an actionable reason; never logs the key.
  • Cost gate: generation is an explicit user action (a button, with the take count and an estimated cost shown) — never auto-fires. R-GLOBAL-9 cache: unchanged prompt+settings → cache hit, no re-spend (the cores already cache).
  • Cost is surfaced everywhere, not gated (§7-D2). Generation stays on the MCP surface — driving image generation from an assistant is a core keryx use case (the blog workflow is Claude→Imagen). The control for the paid-but-reversible class is disclosure, not exclusion: every generating command/tool surfaces, in its description and its response, that it spends money and the estimated/actual cost. A shared cost-estimator (provider+model+count → Cost) gives one cost model read by CLI, studio UI, and MCP — so all three agree.
  • Only post/approve/auth stay MCP-gated — those are outward-facing and irreversible (publish to a public platform / mint credentials). Generation is paid but private and reversible (disposable takes, human still picks) — a different risk class, controlled by cost-visibility.

6. Async model — job + poll, cost on the job

Generation is slow (seconds) and paid, unlike every studio call so far. It runs as an async job, polled (not SSE — gen yields files + a cost, not a token stream; not synchronous — batches benefit from non-blocking, and a closed tab must not lose progress):

  • POST …/generate validates + estimates cost, registers a job, returns 202 {job_id, cost_estimate}, and starts the Generator in the background.
  • GET …/jobs/{id} polls {state, cost_estimate, cost_actual, eta, takes[]} (state: queued → running → done | failed). The cost and timing live on the job resource — that is what surfaces "3 of 8 done, ~£0.12 spent, ~20s left" to the CLI, the editor, and the MCP caller, from one place.
  • A batch (generate across cards) is one job with per-card sub-items, so the poll naturally reports aggregate progress + accruing cost.
  • This survives a refresh / closed tab: the job runs server-side; the UI re-attaches by id. (A WebSocket buys nothing here — one local user, coarse events; the job resource is the durable thing. If push is ever wanted, it layers over the same job state without changing it.)

6.1 Three-tier persistence

Each tier owns a different durability guarantee; the lower tiers are caches over the workspace files, never authoritative:

  1. Workspace files — the truth. cards/takes/*.png on disk + the pick recorded in storyboard.json (git-tracked, commit-on-save). Survives refresh, server restart, reboot. Neither store below holds authoritative data.
  2. Backend in-memory job store — live state. A mutex-guarded map[jobID]*Job holding {state, cost_estimate, cost_actual, eta, takes[]}. Makes poll/re-attach work; survives refresh, not server restart — fine, because any completed gen already wrote tier 1, so a restart only loses the live progress bar. A poll for a forgotten id → 404 → "ask the files" (not an error).
  3. Frontend localStorage — what to watch. Per project: the in-flight job ids
  4. light UI prefs (active card, take-count input). On load the UI re-polls those ids and reconciles against tier 2 (404 → drop the stale tracker, surface "that generation ended while you were away — check the takes"). Never stores takes/picks.

Reconciliation rule: localStorage says what to look for, the backend says what's live, the workspace says what's real.

7. Resolved decisions (2026-06-23)

  • D1 — Async model. Job + poll; cost/eta/state on the job resource; three-tier persistence (§6/§6.1). Not SSE, not synchronous.
  • D2 — MCP. Generation stays on MCP (core AI use case). Control is mandatory cost+risk disclosure on every paid command/tool via a shared cost-estimator; not exclusion. Only post/approve/auth stay gated (§5).
  • D3 — Takes UX. Generate-N → gallery → pick. N is configured per project (workspace/config setting, default 4), not asked per call — N=1 gives single-re-roll behaviour.
  • D4 — Phasing. All three (per-card, cover, portrait) ship in one MR, built in internal order per-card → cover → portrait (they share the job/cost/Generator infra), with coherent per-area commits.
  • D5 — Theme re-roll flag. Record the originating theme on generated Media and wire the R-UI-22 re-roll flag in this MR (workspace theme ≠ card media theme → editor shows "made under the old theme — re-roll?"). Closes R-UI-22's deferred piece.
  • D6 — Testing. Fake Generator (no spend) for the studio API + a godog scenario (generate-N → pick via the fake; upload used-as-is; cost surfaced on the job). The real provider stays env-gated integration (INT_TEST=1); godog never hits a paid API.
  • D7 — In-memory projects. Allow generation (it's the AI loop someone with no local disk most wants), surface a RAM-usage note, and prune unpicked takes by a grace TTL — not on pick (default 3 min, configurable). A periodic sweep frees only candidates older than the window that aren't the picked slot, so "generate round 2, go back and pick from round 1" works within the window while RAM stays bounded. Local-disk projects don't prune (takes are cheap files; workspace cleanup owns them).

8. Definition of done (per phase)

Failing test → code → green just ci; a godog scenario for the workflow (faked generator); the editor media block + takes gallery; docs page updated (docs/components/studio); /simplify + /code-review. The provider integration path stays env-gated (INT_TEST=1), not run in CI.