Skip to content

0015 — studio Phase 2: preview & produce + publish (scope & plan)

Status: SCOPED (the Phase-2 map; all §7 decisions RESOLVED 2026-06-26. Build order set: 2A → 2B → 2D → 2C. Each sub-phase gets its own implementation spec (0016+), drafted from this map + TDD/BDD/docs, reviewed before code. This spec surfaced the cross-component contention before implementation, per Matt's instruction.)

Canonical phase definition (D8): studio Phase 2 = the preview/produce/publish UI surface over the already-built CLI cores. It is distinct from the CLI roadmap's phase numbering (CLAUDE.md, 0001 §1), where the render/posting/auth cores landed earlier — Phase 2 here is surfacing them in the studio. Date: 2026-06-26 Depends on: the Phase-1 studio (specs 0011/0012/0013/0014 — IMPLEMENTED). Surfaces already-built CLI cores: internal/render/*, internal/gencmd (audio), pkg/publish + internal/publish/*, internal/social + internal/socialcmd, pkg/oauth + internal/refreshcmd. Per-feature specs (0016+) are drafted from this map.

1. Goal & scope

Phase 1 gave the studio "author & adjust". Phase 2 is "preview & produce" + "publish": audition/select audio takes, preview the reel, trigger a full render, compose + approve + post the per-platform social set, and gate the server when exposed. The cores are all built and tested (CLI + unit); the studio surfaces none of them today (its mux is generation/edit-only). So Phase 2 is largely new HTTP surface over existing cores — but several cores carry assumptions (OS-filesystem, per-image cost, per-card jobs, secrets-never-in-UI) that the studio's model (afero worktrees, in-memory projects, async jobs, no-secrets API) contends with. This spec maps that contention and proposes a contention-free build order.

In scope (the Phase-2 R-UI surface): R-UI-8 (silent-draft preview), R-UI-9 (VO+music takes audition/select), R-UI-10 remainder (illustration re-roll contact-sheet; the gen/pick loop itself shipped in 0013), R-UI-11 (full render + preview), R-UI-19 (view/edit social set), R-UI-25 (Publish/social composition, SHOULD), R-UI-27 (approve/schedule/post, SHOULD), R-UI-12 (posting status). Cross-cutting: R-API-3 networked-exposure auth (the studio-server gate, deferred from Phase 1).

Out of scope: per-platform OAuth capture in the studio (auth <platform> stays CLI — §4/§7-D6); the unattended schedule (owning-project CI, not keryx code, §6); new platform adapters (all four exist).

2. What already exists (reuse map)

Need Existing core Studio gap
Full render internal/render/ffmpeg (Renderer.Render, exec seam), reel build (--silent) no /render; render is OS-fs-bound + minutes-long
Silent draft (R-UI-8) reel build --silent (storyboard dur, no VO) no /preview; same render path
VO takes gencmd.VOTakesvo/takes/NN-T.mp3; takes.SelectVO(fs,ws,line,take) not FS-explicit; per-line not per-card; no ListVOTakes
Music takes gencmd.MusicTakesmusic/takes/T.mp3; takes.SelectMusic(fs,ws,take) not FS-explicit; no ListMusicTakes (fits pickSingleSlot)
Publisher fan-out pkg/publish (4 adapters), internal/postcmd (PostAll/PostDue, approved-gate, idempotent) no /post; irreversible over HTTP
Social set + ledger internal/social (social.json: status/scheduled_at/posted_at/post_url), internal/socialcmd (Set/Show/Gen via the shared chat seam) no /social; no edit-drops-to-draft; shared-state race
Per-platform constraints internal/social/constraints.go (caps, link, hashtag) hardcoded (R-SOC-5 wants config)
Auth + refresh pkg/oauth (capture, Store), internal/refreshcmd (Refresher, gitlab write-back, notifiers) built; studio only shows health (read-only)
Async job harness studio startGeneration/runJob/getJob/pickSingleSlot/serveFile (0013) image-shaped (cost, ETA, Takes, per-card) — needs widening

Bottom line: the heavy lifting (ffmpeg graph, OAuth, posting state machine, AI social copy) is done. Phase 2 is integration + a handful of core refactors, not new generation/posting logic.

3. Requirements in scope (priorities)

  • Preview/produce: R-UI-8/9/11 (MAY), R-UI-10 contact-sheet (MAY).
  • Publish: R-UI-25 (SHOULD), R-UI-27 (SHOULD), R-UI-19 (MAY), R-UI-12 (MAY); R-SOC-½/6/7 (MUST), R-SOC-¾/⅝ (SHOULD); R-CAP-½/4 (MUST), R-CAP-3 (SHOULD); the post gate R-POST-2/9/10 (MUST) is reused (the studio calls the same path).
  • Cross-cutting: R-API-3 networked-exposure auth (MUST — the studio-server gate); R-AUTH-* is read-only in the studio (show token health; capture stays CLI).

Note: the Publish half carries MUSTs (R-SOC/R-CAP/R-POST) even though the R-UI panel rows are SHOULD — because the contracts those rows implement are MUST. So Publish is not "optional polish"; only the specific UI affordances are SHOULD.

4. The contention map (the cross-component hazards)

The point of this spec: name the intersections that would bite if we built blind.

C1 — Render is OS-filesystem-bound; the studio is afero (in-memory projects). reel build uses os.MkdirTemp, internal/render/cards uses gg.LoadImage/SavePNG (OS paths, system fonts), and ffmpeg/ffprobe shell out needing real files. An in-memory project's inputs live in a.workFS (RAM). There is no bridge to materialise RAM inputs → real temp dir → render → read the mp4 back. This is the single biggest contention. (Image gen avoided it by being pure-afero; render can't.)

C2 — Render is minutes-long with no progress; the job model is seconds-long with per-image ETA. renderTimeout = 10m vs the job store's 9s/image ETA; ffmpeg's exec uses CombinedOutput() (blocks, no stderr stream), so the poll can only report "running" until done. And a render produces one mp4, not N takes — Job.Takes doesn't fit. Render is also free (local ffmpeg) — the cost estimator is image-priced.

C3 — Video serving buffers the whole file into RAM. serveFile does afero.ReadFile (whole file) then ServeContent. Range/seek work, but every <video> scrub re-buffers a tens-of-MB mp4 (doubled for in-memory), with Cache-Control: no-store.

C4 — Audio gen is per-line/per-bed and not FS-explicit. VOTakes(slug, line, …) resolves wsDir(p.FS, slug) and writes via p.FS — it can't run on the active (in-memory) worktree, and VO is per storyboard line while the job/pick model is per-card. Music fits pickSingleSlot; VO needs a (line, take) pick. Audio is paid (ElevenLabs, char/length-priced) — a third cost axis.

C5 — Posting is irreversible, over an HTTP API with no auth. "Post now" (R-UI-27) is public + irreversible; the only safeguard is the approved-status gate in postcmd. The studio API has no auth today, and --host 0.0.0.0 exposes the full API on the LAN unauthenticated (the live R-API-3 gap). So Publish's safety is coupled to the server-exposure gate.

C6 — social.json is shared mutable state with no guard. Both the CLI and (in Phase 2) the studio write it via internal/social — no locking, no revision-guard, whole-Set last-write-wins. R-SOC-8 (editing approved/posted text drops to draft) is not implemented. A studio edit racing a CLI post corrupts the ledger.

C7 — Studio-server auth vs platform OAuth are different subsystems. R-API-3 wants a gate in front of the studio mux (callers → studio). The OAuth subsystem authenticates keryx → platforms. They share nothing. Bundling them (as "Phase 2 auth") is a category error; the spec keeps them separate. And the studio must never handle platform secrets (R-CFG-2) — so auth <platform> stays CLI; the studio only shows token health (read-only, from auth refresh --dry-run).

C8 — Core gaps that are MUSTs: R-SOC-5 (config-tunable constraints — currently hardcoded), R-CAP-4 (reel-caption.md written into the bundle — not built). These land with the Publish work, not as studio glue.

5. Proposed sub-phase structure & build order

Sequenced to retire contention early and keep each MR coherent. Each sub-phase gets its own implementation spec (0016+) drafted from this map, reviewed before code.

  • 2A — Audio takes (R-UI-9). Refactor VOTakes/MusicTakes to an FS-explicit core (the GenerateInto(ctx, p, fs, req) shape) so they run on the active worktree; add ListVOTakes/ListMusicTakes; a (line, take) VO pick + the music single-slot pick; an audio cost axis (per-clip/per-bed). Reuses the job/poll/gallery harness. Retires C4. Smallest, highest-reuse — first.
  • 2B — Render + preview (R-UI-8/11). The render bridge: decide local-only vs the materialise↔readback round-trip (C1, §7-D1); a render-shaped job (one output, no takes, elapsed-only progress for v1 — C2); a video serving fast-path (C3). The silent draft (R-UI-8) is the same render with --silent. Depends on 2A (a faithful render wants selected VO/music).
  • 2C — Publish (R-UI-19/25/27, R-UI-12). Studio /social read/compose/edit (reusing socialcmd + the chat seam), the per-platform steering UI (live char count, link/hashtag hints — R-SOC-½/3), approve (R-POST-2 gate, refuse on constraint violations — R-POST-10), schedule (scheduled_at), Post now (same postcmd path). Implements R-SOC-8 (edit→draft) + the ledger guard (C6), R-SOC-5 (config constraints) + R-CAP-4 (caption file) — the MUST gaps (C8). Posting is gated on the server-exposure auth (2D) when not localhost (C5).
  • 2D — Networked-exposure auth (R-API-3). A gate in front of the whole studio mux, required when bound non-localhost; localhost stays open (single-user dev). The mechanism is the key open decision (§7-D5). Lands before/with 2C since Publish over the LAN is unsafe without it. Retires C5/C7.

Small extras folded in where they fit: R-UI-10 contact-sheet text-leak screen (into 2A/2B — it's a takes-gallery view of card illustrations); R-AUTH read-only token-health view (into 2C — surfaces auth refresh --dry-run).

Dependency graph: 2A → 2B; 2D → 2C (or concurrent, 2D gating 2C's post path); 2A/2B/2C/2D otherwise independent. Recommended order: 2A, 2B, 2D, 2C.

6. Scheduling — explicitly NOT a studio concern

The unattended schedule lives in the owning project's CI (the blog repo), not keryx. The studio sets a per-outlet scheduled_at; CI runs keryx post due + keryx auth refresh all on a cron. keryx ships no scheduler. (0001 §3.2/§4.3; the auth-refresh CI wiring is the owning project's, tracked in 0010 — out of this scope.)

7. Decisions to resolve before building (the contention calls)

  1. Render filesystem model (C1). — RESOLVED 2026-06-26. Root cause: keryx shells out to the ffmpeg binary (no CGO-free Go libav binding), which needs real files on disk — so render is inherently OS-fs-bound. Interim: render requires a LOCAL (on-disk) project; in-memory render is LOCKED OUT with a clear "switch to a local checkout to render" message. This unblocks 2B without resolving the binding question and resolves C3 cleanly (the mp4 lands on the OS fs → http.ServeFile fast-path). SPIKE COMPLETE (task #68, 2026-06-26 — spikes/ffmpeg-render-binding.md): the lock-out is CONFIRMED for Phase 2. No CGO-free in-memory render binding is a sound bet today — the only mature in-memory libav binding (go-astiav) requires CGO (breaks keryx's static cross-compile posture for an edge case); the CGO-free options are immature (ffgo, in-memory output unproven) or can't do our render without a custom build + single-threaded perf hit (go-ffmpreg/wazero-WASM). And keryx keeps the ffmpeg-binary dependency regardless, so the win was only "in-memory + CGO-free", not "drop ffmpeg". If in-memory render is ever needed, the escape hatch is a materialise↔readback bridge (copy worktree inputs → temp dir → native ffmpeg → read the mp4 back) — zero new dependency, native speed, posture intact. go-ffmpreg/wazero is watch-listed for the long term (re-evaluate when it gains an xfade+AAC build + wasm-threads). Phase 2: local-only, no new dependency.
  2. Render progress (C2). — RESOLVED 2026-06-26: elapsed-only v1. The render job reports queued → running (elapsed + a rough eta from total duration) → done/failed; no real %. Exec seam stays CombinedOutput. A %-progress bar (stream + parse ffmpeg -progress) is a fast-follow if it earns its keep.
  3. Video serving (C3). — RESOLVED 2026-06-26: stream via the afero.File ReadSeeker. Generalize the existing serveFile: replace afero.ReadFile + bytes.Reader with activeFS.Open(path) and pass the returned afero.File (an io.ReadSeeker) to http.ServeContent (defer Close; pass a real modtime so range caching works). One serving path for images and video, OS and in-memory: for OsFs the file wraps a real *os.File (native fd seek, only the requested range read — as efficient as http.ServeFile), and for in-memory it reads straight from the buffer with no extra copy. Removes today's full-RAM-copy for all file serving, not just video. Chosen over a dedicated http.ServeFile fast-path because that would force a second code path (it can't serve the in-memory fs that image takes use) for no efficiency gain on the OS-fs case.
  4. Audio gen FS + cost (C4). — RESOLVED 2026-06-26: FS-explicit refactor + audio cost axis. Refactor VOTakes/MusicTakes to the FS-explicit GenerateInto(ctx, p, fs, req) shape so audio takes run on the active worktree (in-memory-capable, consistent with image gen); add ListVOTakes/ListMusicTakes, a (line, take) VO pick (music fits pickSingleSlot), and an audio cost axis (per-clip / per-bed) to the estimator so cost disclosure (0013 D2) holds for audio too.
  5. Networked-exposure auth (C5/C7, R-API-3). — RESOLVED 2026-06-26: startup bearer token. On a non-localhost bind, generate a random token at startup, print it in the listen URL (http://host:port/?token=…, Jupyter/Vite style); a middleware in front of the mux requires it (header / query → cookie) for the /api surface. localhost bind stays open (single-user dev). No secret in config, rotates per start. The SPA captures the token from the URL. Implementation: GTB's server authn middleware (go-tool-base v0.23.0, released 2026-06-26). At 2D: gtb update to ≥ v0.23.0 and wire its authn middleware over the stdlib mux (the studio deliberately doesn't use GTB's full NewServer, but its middleware composes).
  6. Platform auth in the studio (C7). — RESOLVED 2026-06-26: CLI captures; studio shows read-only health. auth <platform> capture stays CLI (interactive, MCP-gated, secret-handling). The studio never captures or stores a token (R-CFG-2); it surfaces a read-only Connections view — per platform connected/ not + expiry, read from the non-secret expiry metadata (platforms.<p>.*_expires_at) — so a dead/expiring token is flagged in the cockpit before a publish fails. v1 reads stored metadata (no subprocess/network); a live auth refresh --dry-run "check now" is a fast-follow. Studio-driven OAuth capture (option C) is rejected: it would put raw-token capture on a network-exposable API, contradicting R-CFG-2 and the reason auth is interactive + MCP-gated.
  7. Ledger concurrency + R-SOC-8 (C6). — RESOLVED 2026-06-26: R-SOC-8 + optimistic guard. Implement R-SOC-8 (editing an approved/posted platform's text drops it to draft) in the shared internal/social path, so CLI and studio both get it. Add an optimistic concurrency guard: the studio reads social.json with a revision token (mtime or content hash) and refuses to write if it changed since read (409 → "reload, it changed"), preventing a studio edit clobbering a concurrent CLI post.
  8. Phase label reconciliation. — RESOLVED 2026-06-26: canonicalize "studio Phase 2". Define it: studio Phase 2 = the preview/produce/publish UI surface over the already-built CLI cores. Fix the stray labels (CLAUDE.md's "Phase 3" for auth/refresh, 0001 §1's "posting adapters") with a one-line pointer to this definition, so the docs stop contradicting. (Not a full roadmap renumber — just align the labels.)
  9. MUST gaps to close in 2C (C8). — RESOLVED 2026-06-26: both land in 2C. Move the per-platform constraints into config (R-SOC-5; the current hardcoded values become the defaults) and write reel-caption.md into the bundle from keryx social (R-CAP-4). No MUST left open behind the shipped Publish panel.
  10. Posting safety (C5). — RESOLVED 2026-06-26: layered posture. "Post now" is human-initiated only (never auto-fired); reuses the same postcmd path + approved-gate (refuses unapproved + on constraint violations, R-POST-2/10); allowed only when the server is localhost-bound or authed (D5 bearer); never on the MCP surface; unattended posting stays CI-only (post due). Defence-in-depth — the gate, the bind, and human intent all required. (A typed per-post confirm is an optional later UX refinement, not required.)

8. Testing / DoD per sub-phase

Each sub-phase: failing test → code → green just ci; godog for the user-facing workflow (faked provider/publisher seams — no real posting or paid gen in CI; the existing env-gated integration tests stay env-gated); docs page per component; /simplify + /code-review. The Publisher/render/voice/music seams are faked behind their interfaces exactly as the image Generator is. Posting godog asserts the approved-gate + idempotency against a fake publisher; never a live platform.

9. Open questions parked for the per-feature specs

  • Audio cost pricing model (per-character vs per-second; ElevenLabs unit).
  • Whether the contact-sheet (R-UI-10) is a distinct view or a mode of the existing takes gallery.
  • Render queue policy (one render at a time per project? cancel-in-flight on a new request?).
  • Token-health view detail (which expiries to surface; refresh is CLI/CI-run).