0015 — studio Phase 2: preview & produce + publish (scope & plan)¶
Status: SCOPED (the Phase-2 map; all §7 decisions RESOLVED 2026-06-26. Build order set: 2A → 2B → 2D → 2C. Each sub-phase gets its own implementation spec (0016+), drafted from this map + TDD/BDD/docs, reviewed before code. This spec surfaced the cross-component contention before implementation, per Matt's instruction.)
Canonical phase definition (D8): studio Phase 2 = the preview/produce/publish
UI surface over the already-built CLI cores. It is distinct from the CLI roadmap's
phase numbering (CLAUDE.md, 0001 §1), where the render/posting/auth cores
landed earlier — Phase 2 here is surfacing them in the studio.
Date: 2026-06-26
Depends on: the Phase-1 studio (specs 0011/0012/0013/0014 — IMPLEMENTED). Surfaces
already-built CLI cores: internal/render/*, internal/gencmd (audio), pkg/publish
+ internal/publish/*, internal/social + internal/socialcmd, pkg/oauth +
internal/refreshcmd. Per-feature specs (0016+) are drafted from this map.
1. Goal & scope¶
Phase 1 gave the studio "author & adjust". Phase 2 is "preview & produce" + "publish": audition/select audio takes, preview the reel, trigger a full render, compose + approve + post the per-platform social set, and gate the server when exposed. The cores are all built and tested (CLI + unit); the studio surfaces none of them today (its mux is generation/edit-only). So Phase 2 is largely new HTTP surface over existing cores — but several cores carry assumptions (OS-filesystem, per-image cost, per-card jobs, secrets-never-in-UI) that the studio's model (afero worktrees, in-memory projects, async jobs, no-secrets API) contends with. This spec maps that contention and proposes a contention-free build order.
In scope (the Phase-2 R-UI surface): R-UI-8 (silent-draft preview), R-UI-9 (VO+music takes audition/select), R-UI-10 remainder (illustration re-roll contact-sheet; the gen/pick loop itself shipped in 0013), R-UI-11 (full render + preview), R-UI-19 (view/edit social set), R-UI-25 (Publish/social composition, SHOULD), R-UI-27 (approve/schedule/post, SHOULD), R-UI-12 (posting status). Cross-cutting: R-API-3 networked-exposure auth (the studio-server gate, deferred from Phase 1).
Out of scope: per-platform OAuth capture in the studio (auth <platform> stays
CLI — §4/§7-D6); the unattended schedule (owning-project CI, not keryx code, §6);
new platform adapters (all four exist).
2. What already exists (reuse map)¶
| Need | Existing core | Studio gap |
|---|---|---|
| Full render | internal/render/ffmpeg (Renderer.Render, exec seam), reel build (--silent) |
no /render; render is OS-fs-bound + minutes-long |
| Silent draft (R-UI-8) | reel build --silent (storyboard dur, no VO) |
no /preview; same render path |
| VO takes | gencmd.VOTakes → vo/takes/NN-T.mp3; takes.SelectVO(fs,ws,line,take) |
not FS-explicit; per-line not per-card; no ListVOTakes |
| Music takes | gencmd.MusicTakes → music/takes/T.mp3; takes.SelectMusic(fs,ws,take) |
not FS-explicit; no ListMusicTakes (fits pickSingleSlot) |
| Publisher fan-out | pkg/publish (4 adapters), internal/postcmd (PostAll/PostDue, approved-gate, idempotent) |
no /post; irreversible over HTTP |
| Social set + ledger | internal/social (social.json: status/scheduled_at/posted_at/post_url), internal/socialcmd (Set/Show/Gen via the shared chat seam) |
no /social; no edit-drops-to-draft; shared-state race |
| Per-platform constraints | internal/social/constraints.go (caps, link, hashtag) |
hardcoded (R-SOC-5 wants config) |
| Auth + refresh | pkg/oauth (capture, Store), internal/refreshcmd (Refresher, gitlab write-back, notifiers) |
built; studio only shows health (read-only) |
| Async job harness | studio startGeneration/runJob/getJob/pickSingleSlot/serveFile (0013) |
image-shaped (cost, ETA, Takes, per-card) — needs widening |
Bottom line: the heavy lifting (ffmpeg graph, OAuth, posting state machine, AI social copy) is done. Phase 2 is integration + a handful of core refactors, not new generation/posting logic.
3. Requirements in scope (priorities)¶
- Preview/produce: R-UI-8/9/11 (MAY), R-UI-10 contact-sheet (MAY).
- Publish: R-UI-25 (SHOULD), R-UI-27 (SHOULD), R-UI-19 (MAY), R-UI-12 (MAY);
R-SOC-½/6/7 (MUST), R-SOC-¾/⅝ (SHOULD); R-CAP-½/4 (MUST), R-CAP-3 (SHOULD);
the
postgate R-POST-2/9/10 (MUST) is reused (the studio calls the same path). - Cross-cutting: R-API-3 networked-exposure auth (MUST — the studio-server gate); R-AUTH-* is read-only in the studio (show token health; capture stays CLI).
Note: the Publish half carries MUSTs (R-SOC/R-CAP/R-POST) even though the R-UI panel rows are SHOULD — because the contracts those rows implement are MUST. So Publish is not "optional polish"; only the specific UI affordances are SHOULD.
4. The contention map (the cross-component hazards)¶
The point of this spec: name the intersections that would bite if we built blind.
C1 — Render is OS-filesystem-bound; the studio is afero (in-memory projects).
reel build uses os.MkdirTemp, internal/render/cards uses gg.LoadImage/SavePNG
(OS paths, system fonts), and ffmpeg/ffprobe shell out needing real files. An
in-memory project's inputs live in a.workFS (RAM). There is no bridge to
materialise RAM inputs → real temp dir → render → read the mp4 back. This is the
single biggest contention. (Image gen avoided it by being pure-afero; render can't.)
C2 — Render is minutes-long with no progress; the job model is seconds-long with
per-image ETA. renderTimeout = 10m vs the job store's 9s/image ETA; ffmpeg's
exec uses CombinedOutput() (blocks, no stderr stream), so the poll can only report
"running" until done. And a render produces one mp4, not N takes — Job.Takes
doesn't fit. Render is also free (local ffmpeg) — the cost estimator is
image-priced.
C3 — Video serving buffers the whole file into RAM. serveFile does
afero.ReadFile (whole file) then ServeContent. Range/seek work, but every
<video> scrub re-buffers a tens-of-MB mp4 (doubled for in-memory), with
Cache-Control: no-store.
C4 — Audio gen is per-line/per-bed and not FS-explicit. VOTakes(slug, line, …)
resolves wsDir(p.FS, slug) and writes via p.FS — it can't run on the active
(in-memory) worktree, and VO is per storyboard line while the job/pick model is
per-card. Music fits pickSingleSlot; VO needs a (line, take) pick. Audio is
paid (ElevenLabs, char/length-priced) — a third cost axis.
C5 — Posting is irreversible, over an HTTP API with no auth. "Post now" (R-UI-27)
is public + irreversible; the only safeguard is the approved-status gate in
postcmd. The studio API has no auth today, and --host 0.0.0.0 exposes the
full API on the LAN unauthenticated (the live R-API-3 gap). So Publish's safety
is coupled to the server-exposure gate.
C6 — social.json is shared mutable state with no guard. Both the CLI and (in
Phase 2) the studio write it via internal/social — no locking, no revision-guard,
whole-Set last-write-wins. R-SOC-8 (editing approved/posted text drops to draft) is
not implemented. A studio edit racing a CLI post corrupts the ledger.
C7 — Studio-server auth vs platform OAuth are different subsystems. R-API-3 wants
a gate in front of the studio mux (callers → studio). The OAuth subsystem authenticates
keryx → platforms. They share nothing. Bundling them (as "Phase 2 auth") is a
category error; the spec keeps them separate. And the studio must never handle
platform secrets (R-CFG-2) — so auth <platform> stays CLI; the studio only shows
token health (read-only, from auth refresh --dry-run).
C8 — Core gaps that are MUSTs: R-SOC-5 (config-tunable constraints — currently
hardcoded), R-CAP-4 (reel-caption.md written into the bundle — not built). These
land with the Publish work, not as studio glue.
5. Proposed sub-phase structure & build order¶
Sequenced to retire contention early and keep each MR coherent. Each sub-phase gets its own implementation spec (0016+) drafted from this map, reviewed before code.
- 2A — Audio takes (R-UI-9). Refactor
VOTakes/MusicTakesto an FS-explicit core (theGenerateInto(ctx, p, fs, req)shape) so they run on the active worktree; addListVOTakes/ListMusicTakes; a(line, take)VO pick + the music single-slot pick; an audio cost axis (per-clip/per-bed). Reuses the job/poll/gallery harness. Retires C4. Smallest, highest-reuse — first. - 2B — Render + preview (R-UI-8/11). The render bridge: decide local-only vs the
materialise↔readback round-trip (C1, §7-D1); a render-shaped job (one output, no
takes, elapsed-only progress for v1 — C2); a video serving fast-path (C3). The
silent draft (R-UI-8) is the same render with
--silent. Depends on 2A (a faithful render wants selected VO/music). - 2C — Publish (R-UI-19/25/27, R-UI-12). Studio
/socialread/compose/edit (reusingsocialcmd+ the chat seam), the per-platform steering UI (live char count, link/hashtag hints — R-SOC-½/3), approve (R-POST-2 gate, refuse on constraint violations — R-POST-10), schedule (scheduled_at), Post now (samepostcmdpath). Implements R-SOC-8 (edit→draft) + the ledger guard (C6), R-SOC-5 (config constraints) + R-CAP-4 (caption file) — the MUST gaps (C8). Posting is gated on the server-exposure auth (2D) when not localhost (C5). - 2D — Networked-exposure auth (R-API-3). A gate in front of the whole studio mux, required when bound non-localhost; localhost stays open (single-user dev). The mechanism is the key open decision (§7-D5). Lands before/with 2C since Publish over the LAN is unsafe without it. Retires C5/C7.
Small extras folded in where they fit: R-UI-10 contact-sheet text-leak screen
(into 2A/2B — it's a takes-gallery view of card illustrations); R-AUTH read-only
token-health view (into 2C — surfaces auth refresh --dry-run).
Dependency graph: 2A → 2B; 2D → 2C (or concurrent, 2D gating 2C's post path); 2A/2B/2C/2D otherwise independent. Recommended order: 2A, 2B, 2D, 2C.
6. Scheduling — explicitly NOT a studio concern¶
The unattended schedule lives in the owning project's CI (the blog repo), not
keryx. The studio sets a per-outlet scheduled_at; CI runs keryx post due +
keryx auth refresh all on a cron. keryx ships no scheduler. (0001 §3.2/§4.3; the
auth-refresh CI wiring is the owning project's, tracked in 0010 — out of this scope.)
7. Decisions to resolve before building (the contention calls)¶
- Render filesystem model (C1). — RESOLVED 2026-06-26. Root cause: keryx shells
out to the ffmpeg binary (no CGO-free Go libav binding), which needs real files
on disk — so render is inherently OS-fs-bound. Interim: render requires a LOCAL
(on-disk) project; in-memory render is LOCKED OUT with a clear "switch to a local
checkout to render" message. This unblocks 2B without resolving the binding question
and resolves C3 cleanly (the mp4 lands on the OS fs →
http.ServeFilefast-path). SPIKE COMPLETE (task #68, 2026-06-26 —spikes/ffmpeg-render-binding.md): the lock-out is CONFIRMED for Phase 2. No CGO-free in-memory render binding is a sound bet today — the only mature in-memory libav binding (go-astiav) requires CGO (breaks keryx's static cross-compile posture for an edge case); the CGO-free options are immature (ffgo, in-memory output unproven) or can't do our render without a custom build + single-threaded perf hit (go-ffmpreg/wazero-WASM). And keryx keeps the ffmpeg-binary dependency regardless, so the win was only "in-memory + CGO-free", not "drop ffmpeg". If in-memory render is ever needed, the escape hatch is a materialise↔readback bridge (copy worktree inputs → temp dir → native ffmpeg → read the mp4 back) — zero new dependency, native speed, posture intact.go-ffmpreg/wazero is watch-listed for the long term (re-evaluate when it gains an xfade+AAC build + wasm-threads). Phase 2: local-only, no new dependency. - Render progress (C2). — RESOLVED 2026-06-26: elapsed-only v1. The render job
reports queued → running (elapsed + a rough eta from total duration) → done/failed;
no real %. Exec seam stays
CombinedOutput. A %-progress bar (stream + parse ffmpeg-progress) is a fast-follow if it earns its keep. - Video serving (C3). — RESOLVED 2026-06-26: stream via the
afero.FileReadSeeker. Generalize the existingserveFile: replaceafero.ReadFile+bytes.ReaderwithactiveFS.Open(path)and pass the returnedafero.File(anio.ReadSeeker) tohttp.ServeContent(defer Close; pass a real modtime so range caching works). One serving path for images and video, OS and in-memory: forOsFsthe file wraps a real*os.File(native fd seek, only the requested range read — as efficient ashttp.ServeFile), and for in-memory it reads straight from the buffer with no extra copy. Removes today's full-RAM-copy for all file serving, not just video. Chosen over a dedicatedhttp.ServeFilefast-path because that would force a second code path (it can't serve the in-memory fs that image takes use) for no efficiency gain on the OS-fs case. - Audio gen FS + cost (C4). — RESOLVED 2026-06-26: FS-explicit refactor + audio
cost axis. Refactor
VOTakes/MusicTakesto the FS-explicitGenerateInto(ctx, p, fs, req)shape so audio takes run on the active worktree (in-memory-capable, consistent with image gen); addListVOTakes/ListMusicTakes, a(line, take)VO pick (music fitspickSingleSlot), and an audio cost axis (per-clip / per-bed) to the estimator so cost disclosure (0013 D2) holds for audio too. - Networked-exposure auth (C5/C7, R-API-3). — RESOLVED 2026-06-26: startup bearer
token. On a non-localhost bind, generate a random token at startup, print it in
the listen URL (
http://host:port/?token=…, Jupyter/Vite style); a middleware in front of the mux requires it (header / query → cookie) for the/apisurface. localhost bind stays open (single-user dev). No secret in config, rotates per start. The SPA captures the token from the URL. Implementation: GTB's server authn middleware (go-tool-base v0.23.0, released 2026-06-26). At 2D:gtb updateto ≥ v0.23.0 and wire its authn middleware over the stdlib mux (the studio deliberately doesn't use GTB's fullNewServer, but its middleware composes). - Platform auth in the studio (C7). — RESOLVED 2026-06-26: CLI captures; studio
shows read-only health.
auth <platform>capture stays CLI (interactive, MCP-gated, secret-handling). The studio never captures or stores a token (R-CFG-2); it surfaces a read-only Connections view — per platform connected/ not + expiry, read from the non-secret expiry metadata (platforms.<p>.*_expires_at) — so a dead/expiring token is flagged in the cockpit before a publish fails. v1 reads stored metadata (no subprocess/network); a liveauth refresh --dry-run"check now" is a fast-follow. Studio-driven OAuth capture (option C) is rejected: it would put raw-token capture on a network-exposable API, contradicting R-CFG-2 and the reasonauthis interactive + MCP-gated. - Ledger concurrency + R-SOC-8 (C6). — RESOLVED 2026-06-26: R-SOC-8 + optimistic
guard. Implement R-SOC-8 (editing an approved/posted platform's text drops it to
draft) in the shared
internal/socialpath, so CLI and studio both get it. Add an optimistic concurrency guard: the studio readssocial.jsonwith a revision token (mtime or content hash) and refuses to write if it changed since read (409 → "reload, it changed"), preventing a studio edit clobbering a concurrent CLIpost. - Phase label reconciliation. — RESOLVED 2026-06-26: canonicalize "studio Phase
2". Define it: studio Phase 2 = the preview/produce/publish UI surface over the
already-built CLI cores. Fix the stray labels (
CLAUDE.md's "Phase 3" for auth/refresh,0001 §1's "posting adapters") with a one-line pointer to this definition, so the docs stop contradicting. (Not a full roadmap renumber — just align the labels.) - MUST gaps to close in 2C (C8). — RESOLVED 2026-06-26: both land in 2C. Move the
per-platform constraints into config (R-SOC-5; the current hardcoded values become
the defaults) and write
reel-caption.mdinto the bundle fromkeryx social(R-CAP-4). No MUST left open behind the shipped Publish panel. - Posting safety (C5). — RESOLVED 2026-06-26: layered posture. "Post now" is
human-initiated only (never auto-fired); reuses the same
postcmdpath +approved-gate (refuses unapproved + on constraint violations, R-POST-2/10); allowed only when the server is localhost-bound or authed (D5 bearer); never on the MCP surface; unattended posting stays CI-only (post due). Defence-in-depth — the gate, the bind, and human intent all required. (A typed per-post confirm is an optional later UX refinement, not required.)
8. Testing / DoD per sub-phase¶
Each sub-phase: failing test → code → green just ci; godog for the user-facing
workflow (faked provider/publisher seams — no real posting or paid gen in CI;
the existing env-gated integration tests stay env-gated); docs page per component;
/simplify + /code-review. The Publisher/render/voice/music seams are faked behind
their interfaces exactly as the image Generator is. Posting godog asserts the
approved-gate + idempotency against a fake publisher; never a live platform.
9. Open questions parked for the per-feature specs¶
- Audio cost pricing model (per-character vs per-second; ElevenLabs unit).
- Whether the contact-sheet (R-UI-10) is a distinct view or a mode of the existing takes gallery.
- Render queue policy (one render at a time per project? cancel-in-flight on a new request?).
- Token-health view detail (which expiries to surface; refresh is CLI/CI-run).