Skip to content

keryx — design spec

Status: draft / intent (scaffold generated, build not started) Owner: Matt Cockayne Last updated: 2026-06-14

1. Purpose

keryx generates short vertical promo reels for blog posts and publishes them to social platforms — on demand and on a schedule.

It is the home for the PHP Boy Scout content-marketing automation. The blog (phpboyscout.uk) produces a lot of strong writing but leans on organic traffic; the policy now is a reel per post (see the blog's CLAUDE.md, "Promo reels"), pushed to the platforms where a cold account can still get algorithmic reach. keryx turns that into one self-contained tool.

It supersedes the ad-hoc Python scripts in the blog repo (scripts/gen-*.py): those are the working reference implementation; keryx ports them to Go and adds the posting layer.

2. Goals / non-goals

Goals - Generate the post's media assets: cover art and avatar (Gemini), and a reel from a storyboard + cover — text cards + voice-clone narration + a music bed, 9:16, on the publication palette. - Publish a video to Instagram, YouTube, TikTok and LinkedIn. - Run on demand from the CLI, and unattended from GitLab scheduled pipelines. - Ship as a single Go binary (the go-tool-base / GTB ecosystem) — a stateless tool installed and driven by the owning project; content, config, and schedule live in that project, not in keryx (§3.2).

Non-goals - Not a video editor / timeline UI. keryx composes AI-generated and uploaded/captured clips into a narrative; it does not offer a timeline, keyframes, or clip-level editing. The one permitted exception is AI-driven coarse edits by natural language — e.g. "trim the first 5s of clip 1 to drop the awkward pause" → a single ffmpeg trim — and that is the hard ceiling. For real editing, use a dedicated tool (e.g. kiru.app, a modern Rust video editor); keryx will not reinvent one. (See 0004 for the long-form direction.) - Not multi-tenant: it posts to Matt's own single set of accounts. - Not a blog/site generator — it produces media assets (cover, avatar, reel) and posts them; it does not author posts or build the site. It consumes a post's storyboard and writes its cover/reel back for the blog to use. - Not an analytics / performance tool. keryx posts and records the returned post URL; measuring views, engagement, or reach is out of scope — far better dedicated tooling exists for that.

3. Reference implementation (port from these)

The Python scripts in the blog repo (~/workspace/phpboyscout/blog/scripts/) are the source of truth for the generation behaviour. Port faithfully, then delete the dependence on them.

Python (blog) keryx (Go) Notes
gen-vo.py internal/gen/voice ElevenLabs TTS, voice clone MhaH9hcD2Ulcr80j28Z1, settings stability 0.6 / similarity 0.92
gen-music.py internal/gen/music ElevenLabs Music (/v1/music), prompt + music_length_ms
gen-reel.py internal/gen/reel the meat: storyboard → cards (PIL→Go image/font) → ffmpeg xfade + audio mux
gen-cover.py internal/gen/cover Gemini Imagen, the three house styles (clay/editorial/blueprint)
gen-portrait.py internal/gen/portrait Gemini image, image-to-image avatar
scripts/README.md documents the pipeline + env vars

Keep the storyboard schema identical to the Python version. The thematic constants the scripts hold inline — the publication palette (deep petrol-teal, warm amber-orange, soft cream, dark charcoal), the cover-style prompt prefixes, the portrait prompt, the music tone, and the voice settings — move into config-driven themes (§6) rather than staying hardcoded. The seeded values stay identical to the Python version; only their home changes.

3.1 Reel composition (the important detail)

Mirror gen-reel.py (as evolved on the second reel — see the card modes below): - 1080×1920, ~30–45s, 30fps, H.264 + AAC. - The post's cover art bookends (first/last cards). Accent words rendered in the accent colour. - Two card modes (per-card, chosen in the storyboard): - block — one short line over a solid palette background. The simplest mode; good for punch lines and the closer. - overlay — a full-bleed, theme-matched media panel (still image or a short video clip) with a bottom gradient scrim (transparent → charcoal over the lower ~half) and the line set over it (bold cream, accent word in amber). This is the evolved default — a run of block cards reads as one monotonous wall, so most body cards carry per-card media. The media is either: - generated from the card's scene prompt in the reel theme's illustration style (§6) at 9:16 — a still (image provider, Imagen/ Gemini) or a short video (video provider, e.g. Gemini Omni — §3.4); or - user-supplied — a pre-rendered image or video the user uploads via the editor file picker (R-UI-29) or assigns on the CLI, used as-is (no AI call, and it skips the text-leak screen since the user owns it). A video panel is fit to the card's VO-driven duration (loop if shorter, trim if longer); the scrim + text overlay is composited identically over an image or a video. - Dedicated URL closer: the final card is phpboyscout.uk on its own (mono), not crowded with other copy. - Orphan-controlled wrapping: no lone trailing words / single conjunctions on a line — balance the breaks (awkward orphans were a flagged defect). - VO drives timing: each card's on-screen duration = its narration clip length + a lead (≈0.5s) + tail (≈0.7s). Cards crossfade (xfade ≈0.4s); card i starts at sum(dur[:i]) - i*xfade, and its VO is delayed to start_i + lead. The on-screen card text and the narration are distinct inputs (the batch proved this): the card text is a tight distillation (short, punchy, fits the panel), while the vo narration is fuller, may use a different phrasing, and may carry provider control tags — e.g. ElevenLabs SSML <break time="0.8s"/> for a beat, or a phonetic spelling to fix a mispronunciation. keryx keeps them as separate per-card fields, not one shared string. - Audio = the music bed at ≈0.16 gain (with an end fade) mixed with the VO clips at full gain, limited to avoid clipping. The bed is requested at the reel's computed total length (derived from the VO-driven timeline), not a hand-set value as in the scripts. - Go specifics: render cards with a Go text/image lib (e.g. fogleman/gg or golang.org/x/image/font), drive ffmpeg via os/exec for the xfade chain and audio mux. Match the music tone to the piece (sober for op-eds).

Storyboard schema (per card): text (the on-screen line), vo (the narration — distinct from text, may include SSML <break>/phonetic spelling), palette roles (bg/fg/accent), *accent words*, dur (fallback when no VO), cover/mono flags, an optional per-card voice override (e.g. stability for a line that needs steadying — the batch bumped individual lines to 0.74–0.76), and — for overlay mode — mode: overlay, a scene (the generation prompt, only when generating), and a resolved media object once present: { kind: image|video, source: generated|uploaded, path }. Palette-role applicability is mode-dependent: block uses bg/fg/accent; overlay ignores bg (the illustration fills it) and uses fg/accent for the scrim-overlaid text. keryx keeps the schema stable across the Python parity port and these additions, publishes it as a versioned JSON Schema, and validates storyboard.json on load with actionable errors (a hand-authored or AI-drafted board is the main user input — it must fail loudly, not render garbage). The concrete validation rules are in 0002-interface-contracts.md §3.1 (R-WS-9..14).

Generated card imagery — the text-leak problem. Image models leak mangled lettering ("MIGERTING TO FIRMER GROUND") even when told not to, which is fatal when keryx overlays its own text. The four-reel batch sharpened the mitigation: - a hardened wordless instruction appended to every scene; - the scene must be pure visual descriptiondirective / metaphor clauses leak: the model rendered "One central metaphor: …" verbatim as a caption, so keryx keeps the scene description and the wordless instruction strictly separate and never feeds it editorial directions; - N candidates per card, keep a clean one; re-roll any that leak (some styles are worse — blueprint wants annotation labels and leaked most, even returning an off-style stock photo once); - a screening pass before assembly. The batch did this by human contact-sheet review; keryx SHOULD automate it with OCR — run text detection over each candidate and auto-flag/reject any with detected glyphs, falling back to the contact sheet for the human. This reuses the take model (§5.2): card media are takes like VO/music.

3.2 How a reel gets made — the authoring loop

The Python scripts were driven by hand in a scratch dir, with a human (Matt) as the taste gate. keryx must keep that human-in-the-loop iteration for on-demand use while making the bookkeeping that was manual then automatic now. The observed loop (from the first reel, the Fable-5 op-ed that spawned keryx):

  1. Storyboard — the post's argument distilled to ~9 cards (one short line each; palette bg/fg/accent; *accent words*; cover bookends; a mono closer). This is the creative seed. keryx consumes a hand-authored storyboard.json and optionally drafts one from a post via the GTB chat client (keryx storyboard draft) for a human to edit — never auto-final.
  2. Silent draft first (keryx reel build --silent) — render cards + crossfades with no audio (storyboard dur timing) as a fast proof of format/pacing/typography, before spending on VO/music. (In the loop this caught accent-parsing and line-break bugs.)
  3. Voice — test one line to confirm the clone, then narrate all lines.
  4. Music — generate a tone-matched bed.
  5. Assemble (keryx reel build) — VO drives card timing; music ducked under VO (§3.1).
  6. Polish — per-line copy/voice/music tweaks, then re-build.
  7. Social — compose the per-platform supporting text + hashtags + link (keryx social).
  8. Lock & correlate — persist the winning settings as the theme/defaults; write the deliverables back to the associated bundle (§3.2): the reel-<slug>.mp4 and a human-readable caption file (reel-caption.md) holding the per-platform prose. The batch made this explicit — "write the prose for each reel into a file in the page bundle for me to use when posting" — so even with posting automation, the committed caption file is a first-class deliverable for manual/cross-checking use, alongside the machine social.json in the workspace.

The iteration model to preserve (this is the point): the three audio/copy layers iterate independently and cheaply — you never redo the whole reel to change one thing. Each generator is runnable on its own; the reel re-assembles from whatever's current. Concretely keryx must support:

  • Per-card / per-line addressability — regenerate just line 6's VO and card, or lines 3–9, and re-assemble. (Manually this meant a hand-cut lines-3to9.json.)
  • Take management — generate N candidate takes for a line or a music bed, audition them, and select one as the locked take. (Manually this was cp vo-takes/01-steady.mp3 vo/01.mp3 — keryx should track the chosen variant, not leave it to file copying.)
  • A per-reel workspace — one directory per reel holding the storyboard, vo/, music, takes, and output, so a reel is a resumable unit and nothing lives in an ad-hoc scratch dir.
  • Lock-the-winner — once a cut is approved, fold the settings that produced it (voice stability/similarity, music prompt) back into the theme (§6) so the next reel starts from the approved baseline, not the cold defaults.
  • Cached generation (cost control) — generation calls cost real API credit, so keryx caches by content: a VO line, card illustration, or bed is only re-generated when its input (text/scene/prompt + settings) changes. Re-running reel build re-assembles from cached takes; --force overrides. This is what makes per-line re-roll and re-assembly cheap rather than a full re-spend.
  • Cost accounting & safety — the batch's 8-hour idle hang prompted a "did it run away and spend?" scare that took forensic process/file inspection to dispel. keryx removes that doubt: (a) all generation is finite and bounded — generate N then stop, never an open loop; (b) every provider/ffmpeg call has a timeout (the scripts used timeout 240); © keryx keeps a usage tally per run/workspace (generations + estimated spend) surfaced in output and --json. Estimates are best-effort: per-provider unit rates (providers.<x>.rates) seed them, and where a provider exposes a pricing/ usage API keryx refreshes the rates periodically (cached, with the configured rates as fallback) so the estimate stays as accurate as we can make it — not a hardcoded guess that drifts; (d) a spend guard (spend.confirm_above, global + per-project) — defaults $10 of image/video and 50 000 ElevenLabs characters per run (voice is character-billed, so guarded in its native unit); a batch beyond the threshold prompts for confirmation (auto-yes under --yes/CI). It prevents runaways, not generation. So cost is visible and capped, not a thing to audit after the fact.
  • Reproducibility — capture the inputs, not just the outputs. The storyboard holds each card's scene, vo, and per-card voice override; the workspace also records the music prompt used (tone is bespoke per reel — warm for the leadership piece, wry for the debugging story). So a reel can be re-edited and regenerated from its committed inputs, not only replayed from locked takes.

On-demand vs scheduled. Locally this is the interactive loop above with the human gate. In GitLab scheduled pipelines there is no human gate: the inputs must already be locked — an approved storyboard.json, a theme, and the selected takes/assets in the reel workspace — and keryx renders + posts them deterministically. So the unit that a schedule consumes is an approved reel workspace, produced by the on-demand loop beforehand.

Ownership — keryx is a stateless tool. The reel workspaces, the config / themes, and the schedule itself all belong to the owning project (here, the blog repo), not to keryx. keryx is installed and invoked by that project's pipeline and keeps no per-project state of its own. A workspace is committed in the owning project alongside its post (the finished reel-<slug>.mp4 already lands in the post's page bundle, §3.2 step 8), so a scheduled pipeline checks it out from the repo and runs keryx post against it. This is also why themes are config-driven (§6): a second project that adopts keryx brings its own config, themes, workspaces, and schedule, and the same binary serves it.

Associated content (the bundle). A reel may optionally be associated with a content directory — a Hugo page bundle or any directory. The association (recorded in workspace.yaml) does two things: it tells keryx where the reel belongs (so the finished reel-<slug>.mp4 is written back there, the §3.2 step-8 correlation, generalised beyond Hugo), and it lets keryx read that directory's contents (the post markdown, the cover, other assets) to inform the chat/draft AI with real context. It is optional: a reel can stand alone with pasted source text instead.

3.3 Batch workflow

Reels are often made several at a time, so keryx supports a batch of reel workspaces and the efficient ordering the second reel established. What to make — which posts get a reel, whether to fold a series into one — is the caller's editorial decision (blog-side), not keryx's; keryx just constructs and publishes whatever reels it's given. The mechanics it provides:

  1. One reel sourced from one or more posts. A workspace's storyboard can be drafted from a single post or several (a consolidated series reel) — that's an input, not a policy keryx owns.
  2. Draft all storyboards first. The storyboard copy is the creative bit and is cheap to iterate as text, so draft every reel's cards/narration up front and tune wording in one pass before spending on VO/music/imagery (the rust batch saw heavy copy iteration here). keryx makes this cheap; the wording calls are the human's.
  3. Template-first. Build the first reel end-to-end as the template, validate it, then run the rest through the same proven path — "one reel at a time, validate between each". The reel is the review unit; the human flags images/lines to swap and keryx cheaply re-rolls/re-renders.

keryx supports this with a workspace per reel (§5.2) and theme selection per reel (§6).

3.4 Pluggable backends (provider interfaces)

Every external backend sits behind a narrow Go interface, with the concrete implementation chosen from user config at construction — so an adapter can be swapped or a new one added without touching call sites. This mirrors the Publisher pattern (§4.1) and GTB's multi-provider chat client. Five seams:

Capability Interface (illustrative) Default adapter Config key
Image generation ImageProvider.Generate(ctx, ImageRequest) ([]Image, error) Gemini / Imagen providers.image
Video generation VideoProvider.Generate(ctx, VideoRequest) ([]Clip, error) Gemini (Omni / video) providers.video
Voice / TTS VoiceProvider.Synthesize(ctx, VoiceRequest) (Audio, error) ElevenLabs providers.voice
Music MusicProvider.Compose(ctx, MusicRequest) (Audio, error) ElevenLabs Music providers.music
Rendering Renderer.Render(ctx, Timeline) (Video, error) local ffmpeg providers.render

The video seam generates short per-panel clips (§3.1) and is optional — the reel works fully with still images and uploaded media; it's enabled when a card asks for a generated video. Like every seam it is config-selected and swappable (Gemini Omni today, another model later).

Video generation is a deferred capability (the "video-shaped hole"). It is a real value-add beyond the Python parity, but lands after everything else works (Phase 5, §8). We design the hole now and leave it open — the VideoProvider seam, the card media {kind: image|video} schema, the renderer's "fit a clip to the card duration" path, and the editor's generate-video / upload-video affordance are all specified — but the gen of video and the ffmpeg video-compositing path are not built in the first passes. Uploaded video is the cheap early win (no provider needed); generated video follows. Full feature spec: 0003-video-panels.md.

  • Where they live. Interfaces in internal/gen (promote to a pkg/ provider package if they prove reusable); each adapter is its own package — internal/gen/voice/elevenlabs, internal/gen/image/gemini, internal/render/ffmpeg, … — so adding …/voice/openai or …/image/openai is purely additive.
  • Construction is config-driven. A small factory/registry per capability resolves providers.<capability> to a registered constructor and builds it from that provider's config block (endpoint, model, credentials via keychain/env — never committed). Blank/unknown → the default. Adding an adapter = implement the interface + register a constructor; no call-site changes.
  • Defaults, working out of the box: image = Gemini, voice = ElevenLabs, music = ElevenLabs, render = local ffmpeg. (video = Gemini Omni, but off until Phase 5 — the deferred seam below.)
  • Requests are provider-neutral. The request/response types carry keryx's intent (text, voice settings, a card's scene prompt + aspect, the timeline of cards + audio) — not vendor payloads; each adapter maps intent to its own API and back. Provider-specific identifiers (an ElevenLabs voice.id, a Gemini model name) live in the theme / provider config that the active adapter understands (§6), so swapping provider is a config change plus, where the identity differs, the provider-scoped fields.
  • Rendering stays local ffmpeg, but behind Renderer so a custom or remote renderer (cloud transcode, a different compositor) can drop in later. The ffmpeg adapter owns the xfade chain + audio mux + overlay compositing (§3.1).
  • Testability. Narrow interfaces let fakes stand in for every backend — no network in unit tests, consistent with the no-package-level-mocking-hooks rule (CLAUDE.md); inject the provider, don't reach for a global.

3.5 Projects & persistence (git-first)

A project is the owning repository (§3.2) — it holds the reels, config, themes, social data, and schedule. keryx supports many projects, and git is the persistence and portability layer, not an afterthought:

  • CLI scopes to the current folder. The project is the git working copy you are in; you switch projects by cd. No project picker in the CLI.
  • The studio switches between projects, and can open ones that are not local — a remote git repo. keryx uses go-tool-base's git components (on-disk and in-memory) and VCS auth (GitHub/GitLab) to clone / read / commit / push.
  • Saving is a git commit. Persisting a storyboard, social set, take selection, or asset writes files and commits them to the project repo with a descriptive message; for a remote project the studio works against an in-memory/temp clone and pushes. Auto-commit-on-save vs batched/explicit commit, and auto vs on-demand push, are configurable.
  • Commit the selected, not the candidates. Only the inputs needed to render and post are committed — storyboard.json, social.json, the selected media/VO/music, reel.mp4, workspace.yaml. Candidate takes/ and the generation cache are git-ignored (disposable, regenerable) so the repo doesn't bloat with every re-roll; keryx reel prune clears them. (Trade-off: takes are session-scoped on a remote/mobile workspace — you select, the selected is committed.)
  • Why git-first: history + rollback for free, and portability — the same project opens from any device by its remote, so the mobile studio can author, approve, and post against a remote repo with no local checkout (in-memory git). It also fits the stateless-tool model (§3.2): state lives in the repo, now reachable remotely. Credentials use the GTB keychain / CI variables; never committed.
  • Concurrency = git. keryx is single-writer per workspace per process; it does not add its own locking. Concurrent edits (a human in the studio + a CI run, or two devices) reconcile as ordinary git merges/conflicts — the repo is the boundary. Posting stays safe regardless via the idempotency record (social.json): a posted platform is never re-posted (§4.3).

Large-file persistence (pluggable — persistence.media)

Committing large binaries (the reel mp4, and especially Phase-5 video panels) to a base git repo bloats it. There is no single right answer, and the landscape moved — so keryx makes the strategy a config-selected seam, set globally and overridable per project (persistence.media.store), exactly like the provider seams (§3.4). Adapters:

Store What it does Fits
git (default) commit media straight into the repo stills + the modest reel mp4 (a 30–45s 1080×1920 H.264 is only a few MB)
git-lfs LFS-tracked patterns, pointers in git teams already on LFS; works on GitHub/GitLab
external blobs in an object store (S3 / R2 / GCS / SSH), a pointer/manifest committed to git (the DVC / git-sfs style) host-agnostic; large/video-heavy reels; avoids LFS servers
none don't persist large media in keryx; reference paths only when another system owns the assets

Landscape (investigated 2026-06): Git-LFS is not formally deprecated but is increasingly seen as legacy (file-level dedup, a central LFS server, host bandwidth/storage caps). Xet (Hugging Face, ex-XetHub) is the notable modern successor — chunk-level dedup, now the Hub default — but it's Hub-centric, not a turnkey backend for an arbitrary GitLab repo, so it's a forward-looking adapter, not today's default. The host-agnostic route is object-store + committed pointer (DVC, git-annex, git-sfs all do variants); keryx's external adapter captures that pattern with credentials via keychain/CI. Default stays git because keryx's committed footprint is small until video (Phase 5) — at which point a project flips persistence.media.store to external (or git-lfs) with no code change.

The external store — s3 is the primary backend (sub-selected by persistence.media.external.backend): - s3 (default for external, recommended) — S3-compatible object storage, Cloudflare R2 as the house choice (S3 API, zero egress fees, cheap), AWS S3 / GCS / MinIO equally supported. We design the first iteration around S3. The forward reason: planned features — long-form instructional video and in-browser webcam/mic capture (0004-future-long-form-and-capture.md) — will routinely produce files well past GitLab's ~100 MB package cap, so a real object store is the right foundation now, not a retrofit. The bucket/ credentials are provisioned in the infra repo (Terraform), referenced by keryx via config + keychain/CI. - gitlab-packages (configurable option) — GitLab's built-in Generic Package Registry (durable, per-project, not counted against the storage quota), authed with the project access token keryx already holds (§4.2), so no extra infra for users who want it. Caveat: ~100 MB/file on gitlab.com, so unsuitable once long-form/capture lands — fine for reels. - plus git-lfs and (forward-looking) xet as further selectable backends (above).

So: keryx's own iteration targets s3 (R2), with gitlab-packages, git-lfs, and xet as configurable alternatives other users (or smaller projects) can select. All sit behind the one persistence.media seam; switching is config, not code, global or per project.

4. Posting — platform research (mid-2026)

Condensed from the 2026-06-14 feasibility study. Build order is set by how hard unattended, scheduled, headless posting is. Verify live docs for exact numeric limits before relying on them.

Build order: Instagram → YouTube → TikTok → LinkedIn

1. Instagram (easiest). Content Publishing API, container flow: POST /{ig-user-id}/media (media_type=REELS) → poll container status_code to FINISHEDmedia_publish. Prefer Instagram API with Instagram Login (graph.instagram.com, no Facebook Page needed). Account must be Professional (Business/Creator). Scope instagram_business_content_publish. Own-account posting runs under Standard Access — no App Review, no Business Verification. Long-lived token ≈60 days, refreshable headlessly (ig_refresh_token). Use the resumable direct upload (rupload.facebook.com) — no public URL needed. Limit ≈100 posts / 24h.

2. YouTube (Shorts). No Shorts API — videos.insert (Data API v3) via resumable upload; vertical + ≤3min auto-classifies as a Short. Scope youtube.upload (restricted). Service accounts don't work — need a user OAuth refresh token captured once. Publish the OAuth app to "In production" (else refresh tokens die in 7 days). Public uploads require passing the Audit & Quota Extension + OAuth verification + CASA assessment — until then uploads lock to private. Long approval lead time; start early.

3. TikTok (hard). Content Posting API, Direct Post flow: creator_info/query (mandatory; read privacy_level from it) → video/init (use FILE_UPLOAD chunked to avoid URL domain verification) → upload → poll status/fetch. Scopes video.publish + user.info.basic. Access token 24h; refresh token ≈365 days but ROTATES on every refresh → must persist the new one each time (needs a writable store). Mandatory audit before any public post (private/SELF_ONLY until passed); content-UX guidelines assume a human picks privacy at post time — raise the headless single-owner case during audit.

4. LinkedIn (awkward for unattended). Videos API (/rest/videos init→upload 4MB parts→finalize→poll AVAILABLE) then Posts API (POST /rest/posts). Personal (w_member_social) is self-serve; org page (w_organization_social) needs the gated Community Management API (registered company + review). No refresh token unless you're an approved Marketing Developer Platform partner → otherwise a 60-day token and manual browser re-auth ≈every 55 days. Treat org/automated posting as a stretch goal.

4.1 Publisher interface

A single interface so each platform lands independently:

type Publisher interface {
    Name() string
    Publish(ctx context.Context, video Video, post PostMeta) (PostResult, error)
}
type Video struct { Path string; Width, Height int; DurationSec float64 }
type PostMeta struct {
    Caption string; Tags []string; Title string; Link string
    PerPlatform map[string]PostMeta // platform-specific overrides
}

Each adapter lives in internal/publish/<platform>. A post all command fans out across the configured/enabled platforms and reports per-platform results. Social elements vary by platform — caption length, hashtag conventions, link handling (clickable vs "link in bio"), title vs description — so PostMeta carries per-platform overrides; the studio (§10.1) and keryx social --platform compose and steer these (see 0002-interface-contracts.md §4 for the per-platform constraints the UI enforces).

4.2 Auth & token storage (the infra wrinkle)

Tokens need a writable store — jobs can't natively write back to GitLab CI variables, yet TikTok rotates its refresh token every use and IG/YT need periodic refresh.

  • keryx auth <platform> — interactive OAuth (local browser) to capture the initial token; stored via the GTB keychain locally.
  • keryx auth refresh — a refresh job (own GitLab schedule, well inside each token window) that refreshes/rotates and writes the new tokens back — either to a GitLab project variable via the GitLab API (a project access token with api scope) or an external secret manager.
  • Alert on refresh failure — a silently stale token is the #1 unattended failure mode.
  • Per platform: TikTok persist the rotated refresh token every run; IG ≈every 1–2 weeks inside 60 days; YT just keep it used (and published); LinkedIn expect manual re-auth unless an MDP partner.

4.3 Posting lifecycle, approval & scheduling

Posting runs three ways, with one safety gate. Each platform's social entry (the per-platform social record, §6 / 0002 §4.4) carries a status — draft → approved → posted — plus an optional per-outlet scheduled_at (date & time) and, once posted, posted_at + post_url.

  • Approval gate (prevents accidental posting). keryx post / post all and the studio "Post now" refuse any platform not approved; approval is a deliberate human act (CLI keryx approve, or the studio Publish panel).
  • On-demand posting is available from the CLI (keryx post …) and the web UI (human-initiated, requires approved). Unattended posting is CI-only — the scheduled pipeline runs keryx post due, which scans the project's reels and posts platforms that are approved and whose scheduled_at is due. This is the primary and only path for hands-off posting.
  • Scheduling. scheduled_at per outlet is what the scheduler consumes; approving with no time = "post on the next run / now", with a time = "at/after then". The schedule itself lives in the owning project (§3.2).
  • On success the entry becomes posted with posted_at + post_url; this is the idempotency record below.

Scheduled posting must be safe to retry and observable: - Idempotency ledger. keryx records each successful post (platform, reel, timestamp, returned post id/URL) in the reel workspace. A re-run (pipeline retry, partial failure) skips platforms already posted for that reel — no double-posting; post all is resumable. - --dry-run. Validate inputs, auth, and the reel (dimensions / duration / file size against each platform's limits) and report what would post, without publishing — used in CI to catch problems before a live run. - Partial failure. post all posts to each enabled platform independently, records successes in the ledger, reports per-platform results, and exits non-zero if any platform failed so the pipeline surfaces it. - Retry / rate limits. Network calls use bounded retry with backoff; per-platform rate limits (e.g. IG ≈100/24h) are respected and surfaced rather than hammered. - Alerting. Any post or token-refresh failure raises an alert via the GTB error/help channel (Slack/Teams) — a silent failure is the main unattended risk (mirrors §4.2). - Per-platform caption/format. Caption/tags/title can vary per platform (length caps, hashtag norms, YouTube title-vs-description); PostMeta carries per-platform overrides, defaulting to the shared caption.

5. CLI surface

Every command is scaffolded with gtb generate command (never hand-written), so it is registered in .gtb/manifest.yaml and wired into the root command by the generator — keeping the command surface manifest-managed and regenerable. (See CLAUDE.md for the --ci/--agentless flags and the current update-check caveat.)

# reel = the reel lifecycle (a noun group)
keryx reel new <slug> [--from-post post.md] [--bundle <dir>]   # create a reel workspace (§3.2)
keryx reel list|open|rename|duplicate|rm <slug>  # manage the project's reels (CRUD)
keryx reel link <slug> <dir>                     # associate a reel with a content directory (§3.2)
keryx reel build [-w <slug> | --storyboard b.json --cover c.png] [--theme editorial] [--silent] [--only-line N] [-o reel.mp4]
keryx reel prune <slug>                          # drop candidate takes/cache (keep selected) (§3.5)
keryx storyboard draft <post.md> [-o board.json] # optional AI first draft (§3.2)
# generators (bare command generates; sub-commands manage takes)
keryx cover --scene "..." --theme editorial [--out cover.png]    # cover art
keryx portrait --ref photo.jpg [--theme default] [--out avatar.png]
keryx voice [-w <slug>] [--line N] [--text "..."] [--takes N] [--stability x] [--theme editorial]   # vo from storyboard (§3.1)
keryx voice select <line> <take>                 # lock a chosen take in a workspace
keryx music --prompt "..." [--length 35s] [--theme editorial] [--takes N] [-w <slug>|--out bed.mp3]
keryx music select <take>
keryx cards [--card N] [--takes N] [--video]     # per-card media: AI still or short video (§3.1)
keryx cards set <card> <file>                    # use a pre-rendered image/video instead of AI
keryx cards select <card> <take>                 # lock a clean (no text-leak) generated take
keryx cards sheet [-o sheet.png]                 # contact-sheet text-leak screen
# social lifecycle: compose → approve → post
keryx social [-w <slug>] [--platform <p>]        # compose per-platform text + hashtags + link + title (§4.3)
keryx approve <platform|all> [-w <slug>] [--at <datetime>] [--revoke]   # gate posting; set schedule (§4.3)
keryx post <instagram|youtube|tiktok|linkedin> [-w <slug>|<file>] [--dry-run]
keryx post all [-w <slug>|<file>] [--dry-run]    # on-demand; requires approved; idempotent (§4.3)
keryx post due [-w <slug>]                        # CI/scheduled: post approved platforms whose time is due
# auth & config
keryx auth <platform>                            # interactive OAuth capture
keryx auth refresh [--platform <p>]              # CI refresh/rotate + write-back
keryx theme <list|show|add|edit|rm> ...          # manage the theme catalog (§6)
keryx studio [--port N]                           # web UI: manage reels + author + social (§10)

Every generation command takes --theme <keyword>; omitted, it uses the configured default theme for that command's type (§6). Plus the GTB defaults (update, init, docs, doctor, changelog, config, keychain, mcp). keryx init seeds the theme catalog and config holds platform enablement and defaults; secrets come from env / keychain / CI variables, never committed.

The CLI always scopes to the current folder — the project is the git working copy you are in (§3.5); switch projects by cd. Multi-project switching and remote (git) projects are a studio capability, not a CLI one.

MCP is enabled (the GTB mcp feature). keryx was scaffolded with MCP off (props.Disable(props.McpCmd)); Phase 0 re-generates the scaffold with mcp in the feature set so keryx mcp runs an MCP server exposing keryx's commands as tools. This is a third interface surface alongside the CLI and the web UI, and it lets an AI assistant drive the authoring loop (draft storyboards, run takes/select, assemble) against a workspace. Its tool contracts mirror the CLI command contracts — see 0002-interface-contracts.md §5 (including which commands are safe to expose vs gated).

Per-command contracts (inputs/outputs/exit codes/side-effects + testable requirements), CLI conventions, the workspace layout, and the web UI functional requirements / UX / API are specified in 0002-interface-contracts.md — the basis for the §8 tests.

5.1 Generation commands

Each maps to one generator in internal/gen/ (§3) and is usable standalone or as a step in the reel pipeline:

  • keryx cover — generate cover art via Gemini Imagen (port of gen-cover.py). Resolves an article-type theme for the style prefix + palette, appends --scene (the per-post scene), and writes one or more PNG samples. --n controls sample count; review before use. The chosen cover is the bookend art fed to keryx reel build --cover.
  • keryx portrait — stylise reference photo(s) into the risograph avatar via the Gemini image model (port of gen-portrait.py). Resolves a portrait-type theme for the prompt + palette; --ref is repeatable for multiple reference photos, --n makes variants. Used for the blog logo / social avatar, not the per-post pipeline.
  • keryx reel build — assemble the 9:16 reel from a storyboard + cover (port of gen-reel.py); resolves a reel-type theme for palette, card fonts, music tone and voice. See §3.1. (reel itself is the lifecycle noun group — new/list/link/build/prune — not a bare action, to avoid overloading.)
  • keryx voice / keryx music — the reel's narration and music bed as standalone steps (ElevenLabs); both resolve the reel theme's voice / music settings unless overridden by flags.

cover and portrait are first-class on-demand commands (image generation is non-deterministic — you regenerate and pick), and their generators are also callable internally by future composite flows. Convention: a bare generator (cover/portrait/voice/music/cards) generates; its sub-commands (select/set/sheet) manage takes.

5.2 Authoring & iteration commands

These exist to make the §3.2 loop fast and bookkeeping-free — they are what keryx adds over the bare scripts:

  • keryx reel new <slug> — scaffold a per-reel workspace (storyboard, vo/, music, takes, output) so a reel is a resumable unit. --from-post seeds it with an AI-drafted storyboard.
  • keryx storyboard draft <post.md> — optional: draft a storyboard from a post via the GTB chat client for a human to edit. keryx never treats a draft as final; the edited storyboard.json is the input to everything else. Works without it (hand-author the JSON); only this command needs an LLM provider.
  • keryx reel build --silent — render the silent, dur-timed draft (no audio) for the fast format/pacing/typography proof before spending on VO/music.
  • keryx reel build --only-line N (workspace mode) — re-assemble after regenerating a single line's VO/card, reusing unchanged takes — the one-line-turnaround that made polish cheap.
  • keryx voice --takes N / keryx voice select <line> <take> — generate candidate takes (e.g. steadier vs more natural) and lock the chosen one in the workspace, replacing the manual cp take vo/NN.mp3 step. keryx music --takes N does the same for beds.
  • keryx cards [--card N] [--takes N] — generate the per-card overlay illustrations from each card's scene (in the reel theme's illustration style, §3.1), with the hardened wordless prompt; --takes N makes candidates, keryx cards select <card> <take> locks the clean one, and a contact-sheet (keryx cards sheet) screens them for text leaks before assembly. Re-roll a single card after a copy change without touching the rest.
  • keryx social — compose the per-platform social elements (supporting text + hashtags + link + title), steered to each platform's limits (§4.3, 0002 §4.4); writes social.json.

After a cut is approved, fold the winning voice/music settings into the theme (§6) via keryx theme edit so the next reel starts from the approved baseline.

6. Themes (config-driven aesthetics)

The thematic component of every generated artefact — image-prompt styles, palette, music tone, voice — is config, not hardcoded. The Python scripts held these as constants (STYLES in gen-cover.py, PALETTE in gen-reel.py, the portrait DEFAULT_PROMPT, the VO settings); keryx lifts them into a theme catalog in config so they can be added and edited without a rebuild. This keeps the tool flexible: re-theme, or theme a second brand, by editing config — never code.

6.1 Model

A theme is a self-contained, named aesthetic profile identified by a keyword and tagged with a type declaring the artefact it themes:

type Drives (generator) Fields
article cover image (gen-cover) palette, prompt (style prefix), aspect
reel the 9:16 reel (gen-reel + music + voice) palette, card (mode, fonts, scrim, illustration style prompt), music (prompt, gain), voice (id, stability, similarity)
portrait avatar (gen-portrait) palette, prompt

Types are open-ended — a new generator adds a new type. Because a theme bundles everything its type needs, a second brand/blog is just a new set of themes.

Reel themes mirror the article-type taxonomy. A run of reels should not look like one identical block, so — exactly as covers do — reel themes come in the same three article-type flavours, sharing the palette but differing in card visual treatment + illustration style (the per-card scene art, §3.1, is generated in this style): - editorial (op-eds) — flat colour-block / risograph treatment; the first reel's look. - clay (project / engineering) — softer, rounded-panel, clay-render imagery. - blueprint (tutorials) — draughtsman grid + thin construction-line imagery.

A post's article type picks both its cover (article theme) and its reel (reel theme) of the same keyword — they stay visually matched.

Illustrative shape:

themes:
  defaults:                 # theme used when --theme is omitted, per type
    article: editorial
    reel: editorial
    portrait: default
  article:                  # catalog nested by type → keyword unique within type
    editorial:
      palette: {teal: "#14534F", amber: "#E8923B", cream: "#F2EAD8", charcoal: "#282A2C"}
      prompt: "Editorial conceptual illustration, flat screen-print / risograph..."
      aspect: "16:9"
    clay:      { palette: {...}, prompt: "Isometric 3D clay-render...",  aspect: "16:9" }
    blueprint: { palette: {...}, prompt: "Technical blueprint / schematic...", aspect: "16:9" }
  reel:
    editorial:              # same keyword as the article theme; type disambiguates
      palette: {teal: "#14534F", amber: "#E8923B", cream: "#F2EAD8", charcoal: "#282A2C"}
      card:
        mode: overlay       # full-bleed illustration + scrim + text (block also valid)
        scrim: {from: 0.52, color: charcoal}     # gradient over lower ~half
        font_bold: DejaVuSans-Bold
        font_mono: DejaVuSansMono-Bold
        style: "Editorial conceptual illustration, flat screen-print / risograph, wordless..."
      music: {prompt: "restrained editorial bed", gain: 0.16}
      voice: {id: MhaH9hcD2Ulcr80j28Z1, stability: 0.6, similarity: 0.92}
    clay:      { palette: {...}, card: {mode: overlay, style: "clay-render, rounded panels..."}, music: {...}, voice: {...} }
    blueprint: { palette: {...}, card: {mode: overlay, style: "blueprint grid, thin construction lines..."}, music: {...}, voice: {...} }
  portrait:
    default:   { palette: {...}, prompt: "Stylised editorial avatar..." }

Nesting by type makes "unique within type" structural and lets editorial name both an article and a reel theme without collision; keryx reel build --theme editorial resolves themes.reel.editorial.

Naming convention: short lowercase kebab-case keyword, unique within its type. The type is an explicit field and is usually implied by the command consuming the theme, so the keyword itself does not encode the type — editorial can name both an article theme and a reel theme, and keryx cover --theme editorial vs keryx reel build --theme editorial resolves by the command's type. Keep keywords descriptive of the look (clay, editorial, blueprint), not the brand, unless you run multiple brands.

6.2 Seeded defaults (via keryx init)

keryx init seeds the catalog with the current house set, values identical to the Python scripts (parity):

  • article themes clay, editorial, blueprint — the three verbatim cover-style prompt prefixes from gen-cover.py.
  • reel themes editorial (default), clay, blueprint — mirroring the article types, sharing the petrol-teal/amber/cream/charcoal palette and the voice clone MhaH9hcD2Ulcr80j28Z1 (stability 0.6 / similarity 0.92), but each with its own card treatment + illustration style. editorial is the first reel's look (overlay illustrations + scrim).
  • portrait theme default — the risograph avatar prompt.

init and config features stay enabled for this to work.

6.3 Theme command

keryx theme list [--type article|reel|portrait]      # catalog, grouped by type
keryx theme show <keyword> [--type ...]               # full definition
keryx theme add  <keyword> --type <type> [--from <existing>] [--set k=v ...]
keryx theme edit <keyword> [--type ...] [--set k=v ...]
keryx theme rm   <keyword> [--type ...]

Themes are read and written through the GTB config layer (pkg/config); add/edit/rm persist to the user config file. --from clones an existing theme as a starting point. Generators resolve a theme by --theme <keyword> (falling back to themes.defaults.<type>) and never carry hardcoded thematic constants.

7. Configuration & secrets

Config files (standardised). keryx reads, in precedence order (highest first): CLI flags → env vars → project config .keryx.yaml (repo root) → global config ~/.keryx/config.yaml → embedded defaults — the GTB hierarchical config layer (pkg/config, Viper). The project file holds what belongs to that project (themes §6, provider selection, platform enablement, defaults); the global file is GTB's default config for cross-project preferences. keryx init seeds .keryx.yaml. Secrets are never written to these files — credentials stay in the keychain / env / CI variables (above). The config is hot-reloadable: edits (by hand or via the studio Settings panel, §10.1) are picked up live through GTB's Observable interface and propagate to running components without a restart.

  • Backend selection (§3.4): providers.{image,video,voice,music,render} choose the adapter (defaults Gemini / ElevenLabs / ElevenLabs / ffmpeg), each with its own config block (endpoint, model, credentials). Switching backend is a config change, not a code change.
  • Large-file persistence (§3.5): persistence.media.store (git default / git-lfs / external / none) + the chosen store's config (LFS patterns, or object-store bucket/endpoint with creds via keychain). Global, per-project overridable.
  • Spend guard: spend.confirm_above — per-capability thresholds past which a batch prompts for confirmation, global + per-project. Defaults: image_video_usd: 10 (estimated $ of image+video generation per run) and voice_chars: 50000 (ElevenLabs is character-billed, so the voice guard is in its native unit — ≈ a dozen+ reels' narration, a clear runaway signal, not a cap on normal work). These prevent runaways, not generation (auto-yes under --yes/CI).
  • Cost rates: providers.<x>.rates seed the spend estimate; where a provider has a pricing/usage API, keryx refreshes them periodically (cached, configured rates as fallback) for best-effort accuracy.
  • GEMINI_API_KEY, ELEVENLABS_API_TOKEN for the default adapters; other adapters carry their own keys. Credentials always via env / keychain / CI variables — never committed.
  • An LLM provider for the optional keryx storyboard draft (GTB chat client — provider/key configurable; keryx runs fully without it).
  • Per-platform OAuth client id/secret + tokens (posting).
  • Project config (.keryx.yaml) and the reel workspaces live in the owning project's repo (here, the blog), not in keryx — keryx is stateless (§3.2); global config is ~/.keryx/config.yaml.
  • Local: GTB keychain. CI: masked/protected variables + the writable store for refreshed tokens — all owned by the project running the pipeline.
  • System dependencies. keryx shells to ffmpeg/ffprobe and needs the card fonts (DejaVu bold/mono, or theme-configured). keryx doctor (the GTB default command, extended) verifies their presence and versions, that the configured provider credentials resolve, and that each enabled platform has a non-stale token — run first in CI and on first local use.

8. Testing & quality

keryx follows TDD and BDD, mirroring go-tool-base (CLAUDE.md). The provider seams (§3.4) are what make this tractable: every external dependency is behind an interface, so the deterministic core is unit-tested with no network, no ffmpeg, and no API keys.

  • Unit tests (TDD). Write failing tests first from the spec's behaviour and edge cases. Table-driven with t.Parallel(); mocks generated by mockery into mocks/. New pkg/ code targets ≥90% coverage.
  • Deterministic core, faked edges. Unit-test the logic directly: storyboard parse/validation, the VO-driven timing maths (card start_i, xfade offsets, total length), orphan-controlled wrapping, theme resolution, config-driven provider construction, the idempotency/social record, and social composition. The image/voice/music providers are faked behind their interfaces; the ffmpeg Renderer is faked for logic tests and run for real only in integration.
  • BDD with godog (Gherkin). User-facing workflows get .feature files in features/, step definitions in test/e2e/steps/, driven by a dedicated e2e test binary (cmd/e2e) with all features enabled — the GTB pattern. New CLI commands and multi-step workflows must ship Gherkin scenarios. Priority scenarios: the authoring loop (reel newstoryboard draftreel build --silentvoice --takesselectreel buildsocial), per-line re-roll, card-illustration take selection + text-leak screen, post all against a faked platform (success, partial failure, idempotent retry, --dry-run), token refresh/rotation with write-back, and theme add/edit/resolve.
  • Rendering checks. Verify ffmpeg output by probing (duration ≈ Σ VO + leads/tails − xfades; dimensions 1080×1920; audio + video streams present) and optional golden-frame comparison of a rendered card — not brittle byte-equality on the MP4.
  • Gating. Integration and e2e tests are env-var gated (INT_TEST=1, INT_TEST_E2E=1, subsystem flags) — not build tags — for IDE discoverability, matching GTB. Tests needing ffmpeg/fonts or live APIs gate on a system-dependency / credentials check (doctor, §7).
  • Docs as you go (part of Done). A component/command is not done until its docs are written/updated in the matching section (docs/components/, docs/concepts/, docs/how-to/), cross-referenced with the code — maintained per component as the build progresses, never batched into a later pass.
  • Hygiene. No //nolint — address root causes. just ci (tidy, generate, test, test-race, lint) must be green before any PR.

Definition of Done, every unit of work: failing test(s) first → minimal code → green just ci (TDD); a .feature scenario for user-facing commands/workflows (BDD); and the component's docs page written/updated. All three, every time.

9. Roadmap

  • Phase 0 — scaffold. GTB project generated, builds. ✅ (Correction: the scaffold was generated with mcp disabled — re-generate with mcp in the feature set, §5, before building on it.)
  • Phase 1 — themes + provider seams + port reel-gen to Go. Land the config theme model and keryx theme / keryx init seeding (§6) first, so generators read theme from config from day one, and the provider interfaces (§3.4) so backends are config-selected from the start (Gemini / ElevenLabs / ffmpeg as the first adapters). Then voice + music (thin clients), reel assembly (cards + ffmpeg), then cover + portrait. Parity with the Python scripts; same storyboard, same seeded palette/styles.
  • Phase 1.5 — the authoring loop (§3.2). The reel workspace (reel new), per-line/per-take addressability (voice --takes / voice select, reel build --only-line), the silent-draft proof, social, and the optional storyboard draft. This turns the parity scripts into the fast, bookkeeping-free iteration loop and is the prerequisite for an approved reel workspace that Phase 3 schedules can consume unattended.
  • Phase 2 — posting adapters. Publisher interface, then Instagram → YouTube → TikTok → LinkedIn. post <platform> and post all.
  • Phase 3 — auth, refresh, scheduling. keryx auth + keryx auth refresh (rotate + GitLab write-back), failure alerting, and the GitLab scheduled pipelines that live in the owning project (the blog) running keryx post due against committed, approved reel workspaces (§3.2, §4.3).
  • Phase 4 (stretch) — storyboard studio. The keryx studio web UI (§10): multi-project + reels library, the storyboard editor, per-platform social + approve/schedule/post, the Settings panel (edit .keryx.yaml + hot reload), bundle association, and a chat-driven editor — all over the existing workspaces; no new pipeline. See §10.2 for the frontend-stack options.
  • Phase 5 (deferred) — video panels. Fill the video-shaped hole (§3.4): uploaded-video panels first (no provider), then generated short video (VideoProvider, e.g. Gemini Omni) and the ffmpeg video-compositing path. Comes only once stills-based reels + posting are solid. Full spec: 0003-video-panels.md.
  • Phase 6 (future / intent) — long-form video + in-browser capture. Short-to-medium YouTube-style instructional video as a second artefact type, and webcam/mic capture in the studio as a third media source. Not designed yet — but the architecture leaves room (S3-first storage, open media-source + voice-source enums, artefact-type neutrality). Gap captured in 0004-future-long-form-and-capture.md.

10. Future: storyboard studio (web UI)

Stretch / extended feature — Phase 4. A local web server, started from the CLI (keryx studio), serving a small single-user web UI for managing a project's reels and authoring each one. It is a richer front-end over the same workspaces (§3.2, §5.2) — not a new pipeline: it reads and writes the same files the CLI uses, so UI and CLI are interchangeable. Its functional requirements, UX, and HTTP API contract are in 0002-interface-contracts.md §4.

10.1 What it does

  • Switch projects (incl. remote) — a user has multiple projects, so the studio has a project switcher above the reels; a project may be a local dir or a remote git repo opened via GTB's git components (§3.5). The CLI, by contrast, always scopes to the current folder.
  • Manage many reels (CRUD) — within a project, the studio opens on a reels library: list / create / open / duplicate / rename / delete, with each reel's status (draft → approved → posted) and its associated content at a glance. The single-reel editor is the drill-in.
  • Compose the storyboard — add / reorder / edit cards (mode-adaptive: text, palette roles, accent words, scene, cover/mono flags); a live (approximate for overlay) card preview.
  • Upload images — cover art and portrait reference photos into the workspace.
  • Associate content (optional) — link the reel to a content directory (a page bundle or any directory, §3.2): records where the reel belongs and feeds the directory's text/assets to the chat AI as context. Or just paste source text.
  • Chat to adjust — the GTB chat client proposes storyboard patches (text and structural ops); the human accepts/rejects — nothing auto-applies.
  • Compose social elements per platform — generate and edit the platform-appropriate supporting text, hashtags, links, and titles for Instagram / YouTube / TikTok / LinkedIn, with the UI steering to each platform's conventions and limits (caption length, hashtag norms, link handling). Pairs with keryx social / PostMeta (§4.3, §4.1).
  • Approve, schedule & post on demand — per platform: set the approval gate, an optional scheduled date/time, see the status (draft → approved → posted), and post now (on-demand, human-initiated, requires approved). Unattended posting still happens only in CI (§4.3). Persistence is git (§3.5): saving/approving commits to the project repo.
  • Project settings — a Settings panel to edit the project's config (themes, AI providers, platform enablement, defaults). It writes the project's .keryx.yaml (§7) and relies on GTB config hot reload so changes propagate live; secrets are never written there (keychain/env only).
  • Natural extensions (v2): preview the silent draft, audition voice/music/card takes, trigger a render — all by calling the existing generators/commands.

10.2 Shape

  • New command keryx studio [--port N] [--workspace <slug>]; a GTB default-disabled feature, enabled for this tool.
  • Built on GTB service lifecycle and transport — pkg/controls for start / health / graceful-shutdown, pkg/http for the server, pkg/chat for the chat panel. The frontend is embedded via go:embed so the binary stays a single file.
  • Frontend stack (considered). Two viable shapes:
  • Server-rendered Go + HTMX (recommended). templ components rendered by pkg/http, HTMX for interactivity (card add/edit/reorder), and SSE for streaming the chat panel. No separate JS build/toolchain, all Go, trivially embedded — the best fit for a small single-user local tool, and it keeps the one-language/one-binary property GTB is built around.
  • Embedded SPA (Svelte or Preact). A small JS app built to static assets and embedded, talking to a thin keryx JSON + SSE API. Buys richer client interactivity (smooth drag-reorder, live preview) at the cost of a Node build step in CI. Start with HTMX + templ; reach for an SPA only if the interaction model outgrows it. Avoid a heavyweight React/Next stack for a localhost utility.
  • Mobile-first & responsive (important). The primary device in practice is a phone, so the studio is designed mobile-first: a single-column, tabbed layout (Cards / Editor / Chat / Source) on narrow screens — with chat as the primary authoring surface on mobile — expanding to the three-pane desktop layout with collapsible card-list and chat panes on wide screens. HTMX + templ + responsive CSS covers this; SSE chat works on mobile. Visual language is minimalist — a neutral light base with teal/amber as restrained accents, not blocks of brand colour. Full functional/UX requirements + the mobile layout are in 0002-interface-contracts.md §4.
  • Local, single-user, localhost-bound by default (matches the non-goals: not multi-tenant). It spends API credits (generation, chat) and edits the workspace, so it is not exposed publicly without explicit opt-in and the GTB HTTP server's auth.
  • Not a video timeline editor (§2 non-goals) — it drafts the storyboard; rendering stays the deterministic pipeline.

11. Platform setup prerequisites (the long poles — start early)

  • Instagram: phpboyscout IG as a Professional (Business/Creator) account; a Meta developer app (Standard Access is enough for own-account).
  • YouTube: a Google Cloud project; OAuth consent screen published "In production"; apply for the upload audit + CASA (else uploads lock private).
  • TikTok: a TikTok developer app; submit for the Content Posting audit (private-only until it passes — weeks).
  • LinkedIn: a developer app; personal posting self-serve; org/refresh-token needs MDP partner (company registration).

12. References

  • Interface contracts (CLI + web UI), the test basis: 0002-interface-contracts.md.
  • Video panels (deferred feature spec): 0003-video-panels.md.
  • Long-form video + capture (future directions): 0004-future-long-form-and-capture.md.
  • Blog Python reference: ~/workspace/phpboyscout/blog/scripts/ (+ README.md).
  • Blog reel policy: blog CLAUDE.md, "Promo reels: a reel per post".
  • Platform docs: Instagram Content Publishing (developers.facebook.com/docs/instagram-platform/content-publishing/), YouTube videos.insert (developers.google.com/youtube/v3/docs/videos/insert), TikTok Content Posting (developers.tiktok.com/doc/content-posting-api-get-started), LinkedIn Videos/Posts API (learn.microsoft.com/linkedin/marketing/community-management).