keryx — design spec¶
Status: draft / intent (scaffold generated, build not started) Owner: Matt Cockayne Last updated: 2026-06-14
1. Purpose¶
keryx generates short vertical promo reels for blog posts and publishes them to social platforms — on demand and on a schedule.
It is the home for the PHP Boy Scout content-marketing automation. The blog
(phpboyscout.uk) produces a lot of strong writing but leans on organic traffic;
the policy now is a reel per post (see the blog's CLAUDE.md, "Promo reels"),
pushed to the platforms where a cold account can still get algorithmic reach.
keryx turns that into one self-contained tool.
It supersedes the ad-hoc Python scripts in the blog repo (scripts/gen-*.py):
those are the working reference implementation; keryx ports them to Go and adds
the posting layer.
2. Goals / non-goals¶
Goals - Generate the post's media assets: cover art and avatar (Gemini), and a reel from a storyboard + cover — text cards + voice-clone narration + a music bed, 9:16, on the publication palette. - Publish a video to Instagram, YouTube, TikTok and LinkedIn. - Run on demand from the CLI, and unattended from GitLab scheduled pipelines. - Ship as a single Go binary (the go-tool-base / GTB ecosystem) — a stateless tool installed and driven by the owning project; content, config, and schedule live in that project, not in keryx (§3.2).
Non-goals
- Not a video editor / timeline UI. keryx composes AI-generated and
uploaded/captured clips into a narrative; it does not offer a timeline,
keyframes, or clip-level editing. The one permitted exception is AI-driven
coarse edits by natural language — e.g. "trim the first 5s of clip 1 to drop
the awkward pause" → a single ffmpeg trim — and that is the hard ceiling.
For real editing, use a dedicated tool (e.g. kiru.app, a modern Rust video
editor); keryx will not reinvent one. (See 0004 for the long-form direction.)
- Not multi-tenant: it posts to Matt's own single set of accounts.
- Not a blog/site generator — it produces media assets (cover, avatar, reel)
and posts them; it does not author posts or build the site. It consumes a
post's storyboard and writes its cover/reel back for the blog to use.
- Not an analytics / performance tool. keryx posts and records the returned
post URL; measuring views, engagement, or reach is out of scope — far better
dedicated tooling exists for that.
3. Reference implementation (port from these)¶
The Python scripts in the blog repo (~/workspace/phpboyscout/blog/scripts/)
are the source of truth for the generation behaviour. Port faithfully, then
delete the dependence on them.
| Python (blog) | keryx (Go) | Notes |
|---|---|---|
gen-vo.py |
internal/gen/voice |
ElevenLabs TTS, voice clone MhaH9hcD2Ulcr80j28Z1, settings stability 0.6 / similarity 0.92 |
gen-music.py |
internal/gen/music |
ElevenLabs Music (/v1/music), prompt + music_length_ms |
gen-reel.py |
internal/gen/reel |
the meat: storyboard → cards (PIL→Go image/font) → ffmpeg xfade + audio mux |
gen-cover.py |
internal/gen/cover |
Gemini Imagen, the three house styles (clay/editorial/blueprint) |
gen-portrait.py |
internal/gen/portrait |
Gemini image, image-to-image avatar |
scripts/README.md |
— | documents the pipeline + env vars |
Keep the storyboard schema identical to the Python version. The thematic constants the scripts hold inline — the publication palette (deep petrol-teal, warm amber-orange, soft cream, dark charcoal), the cover-style prompt prefixes, the portrait prompt, the music tone, and the voice settings — move into config-driven themes (§6) rather than staying hardcoded. The seeded values stay identical to the Python version; only their home changes.
3.1 Reel composition (the important detail)¶
Mirror gen-reel.py (as evolved on the second reel — see the card modes below):
- 1080×1920, ~30–45s, 30fps, H.264 + AAC.
- The post's cover art bookends (first/last cards). Accent words rendered in the
accent colour.
- Two card modes (per-card, chosen in the storyboard):
- block — one short line over a solid palette background. The simplest
mode; good for punch lines and the closer.
- overlay — a full-bleed, theme-matched media panel (still image or a
short video clip) with a bottom gradient scrim (transparent → charcoal
over the lower ~half) and the line set over it (bold cream, accent word in
amber). This is the evolved default — a run of block cards reads as one
monotonous wall, so most body cards carry per-card media. The media is either:
- generated from the card's scene prompt in the reel theme's
illustration style (§6) at 9:16 — a still (image provider, Imagen/
Gemini) or a short video (video provider, e.g. Gemini Omni — §3.4); or
- user-supplied — a pre-rendered image or video the user uploads via
the editor file picker (R-UI-29) or assigns on the CLI, used as-is
(no AI call, and it skips the text-leak screen since the user owns it).
A video panel is fit to the card's VO-driven duration (loop if shorter,
trim if longer); the scrim + text overlay is composited identically over an
image or a video.
- Dedicated URL closer: the final card is phpboyscout.uk on its own (mono),
not crowded with other copy.
- Orphan-controlled wrapping: no lone trailing words / single conjunctions on
a line — balance the breaks (awkward orphans were a flagged defect).
- VO drives timing: each card's on-screen duration = its narration clip
length + a lead (≈0.5s) + tail (≈0.7s). Cards crossfade (xfade ≈0.4s); card
i starts at sum(dur[:i]) - i*xfade, and its VO is delayed to
start_i + lead. The on-screen card text and the narration are distinct
inputs (the batch proved this): the card text is a tight distillation
(short, punchy, fits the panel), while the vo narration is fuller, may use a
different phrasing, and may carry provider control tags — e.g. ElevenLabs
SSML <break time="0.8s"/> for a beat, or a phonetic spelling to fix a
mispronunciation. keryx keeps them as separate per-card fields, not one shared
string.
- Audio = the music bed at ≈0.16 gain (with an end fade) mixed with the VO
clips at full gain, limited to avoid clipping. The bed is requested at the
reel's computed total length (derived from the VO-driven timeline), not a
hand-set value as in the scripts.
- Go specifics: render cards with a Go text/image lib (e.g. fogleman/gg or
golang.org/x/image/font), drive ffmpeg via os/exec for the xfade chain
and audio mux. Match the music tone to the piece (sober for op-eds).
Storyboard schema (per card): text (the on-screen line), vo (the
narration — distinct from text, may include SSML <break>/phonetic spelling),
palette roles (bg/fg/accent), *accent words*, dur (fallback when no
VO), cover/mono flags, an optional per-card voice override (e.g.
stability for a line that needs steadying — the batch bumped individual lines
to 0.74–0.76), and — for overlay mode — mode: overlay, a scene (the
generation prompt, only when generating), and a resolved media object once
present: { kind: image|video, source: generated|uploaded, path }. Palette-role applicability is
mode-dependent: block uses bg/fg/accent; overlay ignores bg (the
illustration fills it) and uses fg/accent for the scrim-overlaid text. keryx
keeps the schema stable across the Python parity port and these additions,
publishes it as a versioned JSON Schema, and validates storyboard.json on
load with actionable errors (a hand-authored or AI-drafted board is the main user
input — it must fail loudly, not render garbage). The concrete validation rules
are in 0002-interface-contracts.md §3.1 (R-WS-9..14).
Generated card imagery — the text-leak problem. Image models leak mangled
lettering ("MIGERTING TO FIRMER GROUND") even when told not to, which is fatal
when keryx overlays its own text. The four-reel batch sharpened the mitigation:
- a hardened wordless instruction appended to every scene;
- the scene must be pure visual description — directive / metaphor clauses
leak: the model rendered "One central metaphor: …" verbatim as a caption, so
keryx keeps the scene description and the wordless instruction strictly
separate and never feeds it editorial directions;
- N candidates per card, keep a clean one; re-roll any that leak (some
styles are worse — blueprint wants annotation labels and leaked most, even
returning an off-style stock photo once);
- a screening pass before assembly. The batch did this by human contact-sheet
review; keryx SHOULD automate it with OCR — run text detection over each
candidate and auto-flag/reject any with detected glyphs, falling back to the
contact sheet for the human. This reuses the take model (§5.2): card media are
takes like VO/music.
3.2 How a reel gets made — the authoring loop¶
The Python scripts were driven by hand in a scratch dir, with a human (Matt) as the taste gate. keryx must keep that human-in-the-loop iteration for on-demand use while making the bookkeeping that was manual then automatic now. The observed loop (from the first reel, the Fable-5 op-ed that spawned keryx):
- Storyboard — the post's argument distilled to ~9 cards (one short line
each; palette bg/fg/accent;
*accent words*; cover bookends; a mono closer). This is the creative seed. keryx consumes a hand-authoredstoryboard.jsonand optionally drafts one from a post via the GTB chat client (keryx storyboard draft) for a human to edit — never auto-final. - Silent draft first (
keryx reel build --silent) — render cards + crossfades with no audio (storyboarddurtiming) as a fast proof of format/pacing/typography, before spending on VO/music. (In the loop this caught accent-parsing and line-break bugs.) - Voice — test one line to confirm the clone, then narrate all lines.
- Music — generate a tone-matched bed.
- Assemble (
keryx reel build) — VO drives card timing; music ducked under VO (§3.1). - Polish — per-line copy/voice/music tweaks, then re-build.
- Social — compose the per-platform supporting text + hashtags + link
(
keryx social). - Lock & correlate — persist the winning settings as the theme/defaults;
write the deliverables back to the associated bundle (§3.2): the
reel-<slug>.mp4and a human-readable caption file (reel-caption.md) holding the per-platform prose. The batch made this explicit — "write the prose for each reel into a file in the page bundle for me to use when posting" — so even with posting automation, the committed caption file is a first-class deliverable for manual/cross-checking use, alongside the machinesocial.jsonin the workspace.
The iteration model to preserve (this is the point): the three audio/copy layers iterate independently and cheaply — you never redo the whole reel to change one thing. Each generator is runnable on its own; the reel re-assembles from whatever's current. Concretely keryx must support:
- Per-card / per-line addressability — regenerate just line 6's VO and
card, or lines 3–9, and re-assemble. (Manually this meant a hand-cut
lines-3to9.json.) - Take management — generate N candidate takes for a line or a music bed,
audition them, and select one as the locked take. (Manually this was
cp vo-takes/01-steady.mp3 vo/01.mp3— keryx should track the chosen variant, not leave it to file copying.) - A per-reel workspace — one directory per reel holding the storyboard,
vo/, music, takes, and output, so a reel is a resumable unit and nothing lives in an ad-hoc scratch dir. - Lock-the-winner — once a cut is approved, fold the settings that produced it (voice stability/similarity, music prompt) back into the theme (§6) so the next reel starts from the approved baseline, not the cold defaults.
- Cached generation (cost control) — generation calls cost real API credit,
so keryx caches by content: a VO line, card illustration, or bed is only
re-generated when its input (text/scene/prompt + settings) changes. Re-running
reel buildre-assembles from cached takes;--forceoverrides. This is what makes per-line re-roll and re-assembly cheap rather than a full re-spend. - Cost accounting & safety — the batch's 8-hour idle hang prompted a "did it
run away and spend?" scare that took forensic process/file inspection to
dispel. keryx removes that doubt: (a) all generation is finite and bounded
— generate N then stop, never an open loop; (b) every provider/ffmpeg call has
a timeout (the scripts used
timeout 240); © keryx keeps a usage tally per run/workspace (generations + estimated spend) surfaced in output and--json. Estimates are best-effort: per-provider unit rates (providers.<x>.rates) seed them, and where a provider exposes a pricing/ usage API keryx refreshes the rates periodically (cached, with the configured rates as fallback) so the estimate stays as accurate as we can make it — not a hardcoded guess that drifts; (d) a spend guard (spend.confirm_above, global + per-project) — defaults $10 of image/video and 50 000 ElevenLabs characters per run (voice is character-billed, so guarded in its native unit); a batch beyond the threshold prompts for confirmation (auto-yes under--yes/CI). It prevents runaways, not generation. So cost is visible and capped, not a thing to audit after the fact. - Reproducibility — capture the inputs, not just the outputs. The storyboard
holds each card's
scene,vo, and per-cardvoiceoverride; the workspace also records the music prompt used (tone is bespoke per reel — warm for the leadership piece, wry for the debugging story). So a reel can be re-edited and regenerated from its committed inputs, not only replayed from locked takes.
On-demand vs scheduled. Locally this is the interactive loop above with the
human gate. In GitLab scheduled pipelines there is no human gate: the
inputs must already be locked — an approved storyboard.json, a theme, and the
selected takes/assets in the reel workspace — and keryx renders + posts them
deterministically. So the unit that a schedule consumes is an approved reel
workspace, produced by the on-demand loop beforehand.
Ownership — keryx is a stateless tool. The reel workspaces, the config /
themes, and the schedule itself all belong to the owning project (here, the
blog repo), not to keryx. keryx is installed and invoked by that project's
pipeline and keeps no per-project state of its own. A workspace is committed
in the owning project alongside its post (the finished reel-<slug>.mp4 already
lands in the post's page bundle, §3.2 step 8), so a scheduled pipeline checks it
out from the repo and runs keryx post against it. This is also why themes are
config-driven (§6): a second project that adopts keryx brings its own config,
themes, workspaces, and schedule, and the same binary serves it.
Associated content (the bundle). A reel may optionally be associated with a
content directory — a Hugo page bundle or any directory. The association
(recorded in workspace.yaml) does two things: it tells keryx where the reel
belongs (so the finished reel-<slug>.mp4 is written back there, the §3.2
step-8 correlation, generalised beyond Hugo), and it lets keryx read that
directory's contents (the post markdown, the cover, other assets) to inform
the chat/draft AI with real context. It is optional: a reel can stand alone
with pasted source text instead.
3.3 Batch workflow¶
Reels are often made several at a time, so keryx supports a batch of reel workspaces and the efficient ordering the second reel established. What to make — which posts get a reel, whether to fold a series into one — is the caller's editorial decision (blog-side), not keryx's; keryx just constructs and publishes whatever reels it's given. The mechanics it provides:
- One reel sourced from one or more posts. A workspace's storyboard can be drafted from a single post or several (a consolidated series reel) — that's an input, not a policy keryx owns.
- Draft all storyboards first. The storyboard copy is the creative bit and is cheap to iterate as text, so draft every reel's cards/narration up front and tune wording in one pass before spending on VO/music/imagery (the rust batch saw heavy copy iteration here). keryx makes this cheap; the wording calls are the human's.
- Template-first. Build the first reel end-to-end as the template, validate it, then run the rest through the same proven path — "one reel at a time, validate between each". The reel is the review unit; the human flags images/lines to swap and keryx cheaply re-rolls/re-renders.
keryx supports this with a workspace per reel (§5.2) and theme selection per reel (§6).
3.4 Pluggable backends (provider interfaces)¶
Every external backend sits behind a narrow Go interface, with the concrete
implementation chosen from user config at construction — so an adapter can
be swapped or a new one added without touching call sites. This mirrors the
Publisher pattern (§4.1) and GTB's multi-provider chat client. Five seams:
| Capability | Interface (illustrative) | Default adapter | Config key |
|---|---|---|---|
| Image generation | ImageProvider.Generate(ctx, ImageRequest) ([]Image, error) |
Gemini / Imagen | providers.image |
| Video generation | VideoProvider.Generate(ctx, VideoRequest) ([]Clip, error) |
Gemini (Omni / video) | providers.video |
| Voice / TTS | VoiceProvider.Synthesize(ctx, VoiceRequest) (Audio, error) |
ElevenLabs | providers.voice |
| Music | MusicProvider.Compose(ctx, MusicRequest) (Audio, error) |
ElevenLabs Music | providers.music |
| Rendering | Renderer.Render(ctx, Timeline) (Video, error) |
local ffmpeg |
providers.render |
The video seam generates short per-panel clips (§3.1) and is optional — the reel works fully with still images and uploaded media; it's enabled when a card asks for a generated video. Like every seam it is config-selected and swappable (Gemini Omni today, another model later).
Video generation is a deferred capability (the "video-shaped hole"). It is a real value-add beyond the Python parity, but lands after everything else works (Phase 5, §8). We design the hole now and leave it open — the
VideoProviderseam, the cardmedia {kind: image|video}schema, the renderer's "fit a clip to the card duration" path, and the editor's generate-video / upload-video affordance are all specified — but thegenof video and the ffmpeg video-compositing path are not built in the first passes. Uploaded video is the cheap early win (no provider needed); generated video follows. Full feature spec:0003-video-panels.md.
- Where they live. Interfaces in
internal/gen(promote to apkg/provider package if they prove reusable); each adapter is its own package —internal/gen/voice/elevenlabs,internal/gen/image/gemini,internal/render/ffmpeg, … — so adding…/voice/openaior…/image/openaiis purely additive. - Construction is config-driven. A small factory/registry per capability
resolves
providers.<capability>to a registered constructor and builds it from that provider's config block (endpoint, model, credentials via keychain/env — never committed). Blank/unknown → the default. Adding an adapter = implement the interface + register a constructor; no call-site changes. - Defaults, working out of the box: image = Gemini, voice = ElevenLabs, music = ElevenLabs, render = local ffmpeg. (video = Gemini Omni, but off until Phase 5 — the deferred seam below.)
- Requests are provider-neutral. The request/response types carry keryx's
intent (text, voice settings, a card's
sceneprompt + aspect, the timeline of cards + audio) — not vendor payloads; each adapter maps intent to its own API and back. Provider-specific identifiers (an ElevenLabsvoice.id, a Gemini model name) live in the theme / provider config that the active adapter understands (§6), so swapping provider is a config change plus, where the identity differs, the provider-scoped fields. - Rendering stays local ffmpeg, but behind
Rendererso a custom or remote renderer (cloud transcode, a different compositor) can drop in later. The ffmpeg adapter owns the xfade chain + audio mux + overlay compositing (§3.1). - Testability. Narrow interfaces let fakes stand in for every backend — no
network in unit tests, consistent with the no-package-level-mocking-hooks rule
(
CLAUDE.md); inject the provider, don't reach for a global.
3.5 Projects & persistence (git-first)¶
A project is the owning repository (§3.2) — it holds the reels, config, themes, social data, and schedule. keryx supports many projects, and git is the persistence and portability layer, not an afterthought:
- CLI scopes to the current folder. The project is the git working copy you
are in; you switch projects by
cd. No project picker in the CLI. - The studio switches between projects, and can open ones that are not local — a remote git repo. keryx uses go-tool-base's git components (on-disk and in-memory) and VCS auth (GitHub/GitLab) to clone / read / commit / push.
- Saving is a git commit. Persisting a storyboard, social set, take selection, or asset writes files and commits them to the project repo with a descriptive message; for a remote project the studio works against an in-memory/temp clone and pushes. Auto-commit-on-save vs batched/explicit commit, and auto vs on-demand push, are configurable.
- Commit the selected, not the candidates. Only the inputs needed to render
and post are committed —
storyboard.json,social.json, the selected media/VO/music,reel.mp4,workspace.yaml. Candidatetakes/and the generation cache are git-ignored (disposable, regenerable) so the repo doesn't bloat with every re-roll;keryx reel pruneclears them. (Trade-off: takes are session-scoped on a remote/mobile workspace — you select, the selected is committed.) - Why git-first: history + rollback for free, and portability — the same project opens from any device by its remote, so the mobile studio can author, approve, and post against a remote repo with no local checkout (in-memory git). It also fits the stateless-tool model (§3.2): state lives in the repo, now reachable remotely. Credentials use the GTB keychain / CI variables; never committed.
- Concurrency = git. keryx is single-writer per workspace per process; it
does not add its own locking. Concurrent edits (a human in the studio + a
CI run, or two devices) reconcile as ordinary git merges/conflicts — the
repo is the boundary. Posting stays safe regardless via the idempotency record
(
social.json): apostedplatform is never re-posted (§4.3).
Large-file persistence (pluggable — persistence.media)¶
Committing large binaries (the reel mp4, and especially Phase-5 video panels)
to a base git repo bloats it. There is no single right answer, and the
landscape moved — so keryx makes the strategy a config-selected seam, set
globally and overridable per project (persistence.media.store), exactly
like the provider seams (§3.4). Adapters:
| Store | What it does | Fits |
|---|---|---|
git (default) |
commit media straight into the repo | stills + the modest reel mp4 (a 30–45s 1080×1920 H.264 is only a few MB) |
git-lfs |
LFS-tracked patterns, pointers in git | teams already on LFS; works on GitHub/GitLab |
external |
blobs in an object store (S3 / R2 / GCS / SSH), a pointer/manifest committed to git (the DVC / git-sfs style) | host-agnostic; large/video-heavy reels; avoids LFS servers |
none |
don't persist large media in keryx; reference paths only | when another system owns the assets |
Landscape (investigated 2026-06): Git-LFS is not formally deprecated but is
increasingly seen as legacy (file-level dedup, a central LFS server, host
bandwidth/storage caps). Xet (Hugging Face, ex-XetHub) is the notable modern
successor — chunk-level dedup, now the Hub default — but it's Hub-centric, not
a turnkey backend for an arbitrary GitLab repo, so it's a forward-looking
adapter, not today's default. The host-agnostic route is object-store +
committed pointer (DVC, git-annex, git-sfs all do variants); keryx's
external adapter captures that pattern with credentials via keychain/CI.
Default stays git because keryx's committed footprint is small until video
(Phase 5) — at which point a project flips persistence.media.store to
external (or git-lfs) with no code change.
The external store — s3 is the primary backend (sub-selected by
persistence.media.external.backend):
- s3 (default for external, recommended) — S3-compatible object storage,
Cloudflare R2 as the house choice (S3 API, zero egress fees, cheap),
AWS S3 / GCS / MinIO equally supported. We design the first iteration around
S3. The forward reason: planned features — long-form instructional video
and in-browser webcam/mic capture (0004-future-long-form-and-capture.md)
— will routinely produce files well past GitLab's ~100 MB package cap, so a
real object store is the right foundation now, not a retrofit. The bucket/
credentials are provisioned in the infra repo (Terraform), referenced by
keryx via config + keychain/CI.
- gitlab-packages (configurable option) — GitLab's built-in Generic
Package Registry (durable, per-project, not counted against the storage
quota), authed with the project access token keryx already holds (§4.2), so
no extra infra for users who want it. Caveat: ~100 MB/file on
gitlab.com, so unsuitable once long-form/capture lands — fine for reels.
- plus git-lfs and (forward-looking) xet as further selectable
backends (above).
So: keryx's own iteration targets s3 (R2), with gitlab-packages,
git-lfs, and xet as configurable alternatives other users (or smaller
projects) can select. All sit behind the one persistence.media seam; switching
is config, not code, global or per project.
4. Posting — platform research (mid-2026)¶
Condensed from the 2026-06-14 feasibility study. Build order is set by how hard unattended, scheduled, headless posting is. Verify live docs for exact numeric limits before relying on them.
Build order: Instagram → YouTube → TikTok → LinkedIn¶
1. Instagram (easiest). Content Publishing API, container flow:
POST /{ig-user-id}/media (media_type=REELS) → poll container status_code
to FINISHED → media_publish. Prefer Instagram API with Instagram Login
(graph.instagram.com, no Facebook Page needed). Account must be Professional
(Business/Creator). Scope instagram_business_content_publish. Own-account
posting runs under Standard Access — no App Review, no Business Verification.
Long-lived token ≈60 days, refreshable headlessly (ig_refresh_token). Use
the resumable direct upload (rupload.facebook.com) — no public URL needed.
Limit ≈100 posts / 24h.
2. YouTube (Shorts). No Shorts API — videos.insert (Data API v3) via
resumable upload; vertical + ≤3min auto-classifies as a Short. Scope
youtube.upload (restricted). Service accounts don't work — need a user
OAuth refresh token captured once. Publish the OAuth app to "In production"
(else refresh tokens die in 7 days). Public uploads require passing the
Audit & Quota Extension + OAuth verification + CASA assessment — until then
uploads lock to private. Long approval lead time; start early.
3. TikTok (hard). Content Posting API, Direct Post flow:
creator_info/query (mandatory; read privacy_level from it) → video/init
(use FILE_UPLOAD chunked to avoid URL domain verification) → upload → poll
status/fetch. Scopes video.publish + user.info.basic. Access token 24h;
refresh token ≈365 days but ROTATES on every refresh → must persist the new
one each time (needs a writable store). Mandatory audit before any public
post (private/SELF_ONLY until passed); content-UX guidelines assume a human
picks privacy at post time — raise the headless single-owner case during audit.
4. LinkedIn (awkward for unattended). Videos API (/rest/videos
init→upload 4MB parts→finalize→poll AVAILABLE) then Posts API
(POST /rest/posts). Personal (w_member_social) is self-serve; org page
(w_organization_social) needs the gated Community Management API
(registered company + review). No refresh token unless you're an approved
Marketing Developer Platform partner → otherwise a 60-day token and manual
browser re-auth ≈every 55 days. Treat org/automated posting as a stretch goal.
4.1 Publisher interface¶
A single interface so each platform lands independently:
type Publisher interface {
Name() string
Publish(ctx context.Context, video Video, post PostMeta) (PostResult, error)
}
type Video struct { Path string; Width, Height int; DurationSec float64 }
type PostMeta struct {
Caption string; Tags []string; Title string; Link string
PerPlatform map[string]PostMeta // platform-specific overrides
}
Each adapter lives in internal/publish/<platform>. A post all command fans
out across the configured/enabled platforms and reports per-platform results.
Social elements vary by platform — caption length, hashtag conventions, link
handling (clickable vs "link in bio"), title vs description — so PostMeta
carries per-platform overrides; the studio (§10.1) and keryx social
--platform compose and steer these (see 0002-interface-contracts.md §4 for
the per-platform constraints the UI enforces).
4.2 Auth & token storage (the infra wrinkle)¶
Tokens need a writable store — jobs can't natively write back to GitLab CI variables, yet TikTok rotates its refresh token every use and IG/YT need periodic refresh.
keryx auth <platform>— interactive OAuth (local browser) to capture the initial token; stored via the GTB keychain locally.keryx auth refresh— a refresh job (own GitLab schedule, well inside each token window) that refreshes/rotates and writes the new tokens back — either to a GitLab project variable via the GitLab API (a project access token withapiscope) or an external secret manager.- Alert on refresh failure — a silently stale token is the #1 unattended failure mode.
- Per platform: TikTok persist the rotated refresh token every run; IG ≈every 1–2 weeks inside 60 days; YT just keep it used (and published); LinkedIn expect manual re-auth unless an MDP partner.
4.3 Posting lifecycle, approval & scheduling¶
Posting runs three ways, with one safety gate. Each platform's social entry
(the per-platform social record, §6 / 0002 §4.4) carries a status —
draft → approved → posted — plus an optional per-outlet scheduled_at
(date & time) and, once posted, posted_at + post_url.
- Approval gate (prevents accidental posting).
keryx post/post alland the studio "Post now" refuse any platform notapproved; approval is a deliberate human act (CLIkeryx approve, or the studio Publish panel). - On-demand posting is available from the CLI (
keryx post …) and the web UI (human-initiated, requiresapproved). Unattended posting is CI-only — the scheduled pipeline runskeryx post due, which scans the project's reels and posts platforms that areapprovedand whosescheduled_atis due. This is the primary and only path for hands-off posting. - Scheduling.
scheduled_atper outlet is what the scheduler consumes; approving with no time = "post on the next run / now", with a time = "at/after then". The schedule itself lives in the owning project (§3.2). - On success the entry becomes
postedwithposted_at+post_url; this is the idempotency record below.
Scheduled posting must be safe to retry and observable:
- Idempotency ledger. keryx records each successful post (platform, reel,
timestamp, returned post id/URL) in the reel workspace. A re-run (pipeline
retry, partial failure) skips platforms already posted for that reel — no
double-posting; post all is resumable.
- --dry-run. Validate inputs, auth, and the reel (dimensions / duration /
file size against each platform's limits) and report what would post,
without publishing — used in CI to catch problems before a live run.
- Partial failure. post all posts to each enabled platform independently,
records successes in the ledger, reports per-platform results, and exits
non-zero if any platform failed so the pipeline surfaces it.
- Retry / rate limits. Network calls use bounded retry with backoff;
per-platform rate limits (e.g. IG ≈100/24h) are respected and surfaced rather
than hammered.
- Alerting. Any post or token-refresh failure raises an alert via the GTB
error/help channel (Slack/Teams) — a silent failure is the main unattended
risk (mirrors §4.2).
- Per-platform caption/format. Caption/tags/title can vary per platform
(length caps, hashtag norms, YouTube title-vs-description); PostMeta carries
per-platform overrides, defaulting to the shared caption.
5. CLI surface¶
Every command is scaffolded with gtb generate command (never hand-written),
so it is registered in .gtb/manifest.yaml and wired into the root command by
the generator — keeping the command surface manifest-managed and regenerable.
(See CLAUDE.md for the --ci/--agentless flags and the current update-check
caveat.)
# reel = the reel lifecycle (a noun group)
keryx reel new <slug> [--from-post post.md] [--bundle <dir>] # create a reel workspace (§3.2)
keryx reel list|open|rename|duplicate|rm <slug> # manage the project's reels (CRUD)
keryx reel link <slug> <dir> # associate a reel with a content directory (§3.2)
keryx reel build [-w <slug> | --storyboard b.json --cover c.png] [--theme editorial] [--silent] [--only-line N] [-o reel.mp4]
keryx reel prune <slug> # drop candidate takes/cache (keep selected) (§3.5)
keryx storyboard draft <post.md> [-o board.json] # optional AI first draft (§3.2)
# generators (bare command generates; sub-commands manage takes)
keryx cover --scene "..." --theme editorial [--out cover.png] # cover art
keryx portrait --ref photo.jpg [--theme default] [--out avatar.png]
keryx voice [-w <slug>] [--line N] [--text "..."] [--takes N] [--stability x] [--theme editorial] # vo from storyboard (§3.1)
keryx voice select <line> <take> # lock a chosen take in a workspace
keryx music --prompt "..." [--length 35s] [--theme editorial] [--takes N] [-w <slug>|--out bed.mp3]
keryx music select <take>
keryx cards [--card N] [--takes N] [--video] # per-card media: AI still or short video (§3.1)
keryx cards set <card> <file> # use a pre-rendered image/video instead of AI
keryx cards select <card> <take> # lock a clean (no text-leak) generated take
keryx cards sheet [-o sheet.png] # contact-sheet text-leak screen
# social lifecycle: compose → approve → post
keryx social [-w <slug>] [--platform <p>] # compose per-platform text + hashtags + link + title (§4.3)
keryx approve <platform|all> [-w <slug>] [--at <datetime>] [--revoke] # gate posting; set schedule (§4.3)
keryx post <instagram|youtube|tiktok|linkedin> [-w <slug>|<file>] [--dry-run]
keryx post all [-w <slug>|<file>] [--dry-run] # on-demand; requires approved; idempotent (§4.3)
keryx post due [-w <slug>] # CI/scheduled: post approved platforms whose time is due
# auth & config
keryx auth <platform> # interactive OAuth capture
keryx auth refresh [--platform <p>] # CI refresh/rotate + write-back
keryx theme <list|show|add|edit|rm> ... # manage the theme catalog (§6)
keryx studio [--port N] # web UI: manage reels + author + social (§10)
Every generation command takes --theme <keyword>; omitted, it uses the
configured default theme for that command's type (§6). Plus the GTB defaults
(update, init, docs, doctor, changelog, config, keychain, mcp).
keryx init seeds the theme catalog and config holds platform enablement and
defaults; secrets come from env / keychain / CI variables, never committed.
The CLI always scopes to the current folder — the project is the git working
copy you are in (§3.5); switch projects by cd. Multi-project switching and
remote (git) projects are a studio capability, not a CLI one.
MCP is enabled (the GTB mcp feature). keryx was scaffolded with MCP off
(props.Disable(props.McpCmd)); Phase 0 re-generates the scaffold with mcp in
the feature set so keryx mcp runs an MCP server exposing keryx's commands as
tools. This is a third interface surface alongside the CLI and the web UI,
and it lets an AI assistant drive the authoring loop (draft storyboards, run
takes/select, assemble) against a workspace. Its tool contracts mirror the CLI
command contracts — see 0002-interface-contracts.md §5 (including which
commands are safe to expose vs gated).
Per-command contracts (inputs/outputs/exit codes/side-effects + testable requirements), CLI conventions, the workspace layout, and the web UI functional requirements / UX / API are specified in
0002-interface-contracts.md— the basis for the §8 tests.
5.1 Generation commands¶
Each maps to one generator in internal/gen/ (§3) and is usable standalone or
as a step in the reel pipeline:
keryx cover— generate cover art via Gemini Imagen (port ofgen-cover.py). Resolves anarticle-type theme for the style prefix + palette, appends--scene(the per-post scene), and writes one or more PNG samples.--ncontrols sample count; review before use. The chosen cover is the bookend art fed tokeryx reel build --cover.keryx portrait— stylise reference photo(s) into the risograph avatar via the Gemini image model (port ofgen-portrait.py). Resolves aportrait-type theme for the prompt + palette;--refis repeatable for multiple reference photos,--nmakes variants. Used for the blog logo / social avatar, not the per-post pipeline.keryx reel build— assemble the 9:16 reel from a storyboard + cover (port ofgen-reel.py); resolves areel-type theme for palette, card fonts, music tone and voice. See §3.1. (reelitself is the lifecycle noun group —new/list/link/build/prune— not a bare action, to avoid overloading.)keryx voice/keryx music— the reel's narration and music bed as standalone steps (ElevenLabs); both resolve thereeltheme'svoice/musicsettings unless overridden by flags.
cover and portrait are first-class on-demand commands (image generation is
non-deterministic — you regenerate and pick), and their generators are also
callable internally by future composite flows. Convention: a bare generator
(cover/portrait/voice/music/cards) generates; its sub-commands
(select/set/sheet) manage takes.
5.2 Authoring & iteration commands¶
These exist to make the §3.2 loop fast and bookkeeping-free — they are what keryx adds over the bare scripts:
keryx reel new <slug>— scaffold a per-reel workspace (storyboard,vo/, music, takes, output) so a reel is a resumable unit.--from-postseeds it with an AI-drafted storyboard.keryx storyboard draft <post.md>— optional: draft a storyboard from a post via the GTB chat client for a human to edit. keryx never treats a draft as final; the editedstoryboard.jsonis the input to everything else. Works without it (hand-author the JSON); only this command needs an LLM provider.keryx reel build --silent— render the silent, dur-timed draft (no audio) for the fast format/pacing/typography proof before spending on VO/music.keryx reel build --only-line N(workspace mode) — re-assemble after regenerating a single line's VO/card, reusing unchanged takes — the one-line-turnaround that made polish cheap.keryx voice --takes N/keryx voice select <line> <take>— generate candidate takes (e.g. steadier vs more natural) and lock the chosen one in the workspace, replacing the manualcp take vo/NN.mp3step.keryx music --takes Ndoes the same for beds.keryx cards [--card N] [--takes N]— generate the per-card overlay illustrations from each card'sscene(in the reel theme's illustrationstyle, §3.1), with the hardened wordless prompt;--takes Nmakes candidates,keryx cards select <card> <take>locks the clean one, and a contact-sheet (keryx cards sheet) screens them for text leaks before assembly. Re-roll a single card after a copy change without touching the rest.keryx social— compose the per-platform social elements (supporting text + hashtags + link + title), steered to each platform's limits (§4.3,0002§4.4); writessocial.json.
After a cut is approved, fold the winning voice/music settings into the theme
(§6) via keryx theme edit so the next reel starts from the approved baseline.
6. Themes (config-driven aesthetics)¶
The thematic component of every generated artefact — image-prompt styles,
palette, music tone, voice — is config, not hardcoded. The Python scripts
held these as constants (STYLES in gen-cover.py, PALETTE in gen-reel.py,
the portrait DEFAULT_PROMPT, the VO settings); keryx lifts them into a
theme catalog in config so they can be added and edited without a rebuild.
This keeps the tool flexible: re-theme, or theme a second brand, by editing
config — never code.
6.1 Model¶
A theme is a self-contained, named aesthetic profile identified by a keyword and tagged with a type declaring the artefact it themes:
type |
Drives (generator) | Fields |
|---|---|---|
article |
cover image (gen-cover) |
palette, prompt (style prefix), aspect |
reel |
the 9:16 reel (gen-reel + music + voice) |
palette, card (mode, fonts, scrim, illustration style prompt), music (prompt, gain), voice (id, stability, similarity) |
portrait |
avatar (gen-portrait) |
palette, prompt |
Types are open-ended — a new generator adds a new type. Because a theme bundles everything its type needs, a second brand/blog is just a new set of themes.
Reel themes mirror the article-type taxonomy. A run of reels should not
look like one identical block, so — exactly as covers do — reel themes come in
the same three article-type flavours, sharing the palette but differing in
card visual treatment + illustration style (the per-card scene art, §3.1,
is generated in this style):
- editorial (op-eds) — flat colour-block / risograph treatment; the first
reel's look.
- clay (project / engineering) — softer, rounded-panel, clay-render imagery.
- blueprint (tutorials) — draughtsman grid + thin construction-line imagery.
A post's article type picks both its cover (article theme) and its reel
(reel theme) of the same keyword — they stay visually matched.
Illustrative shape:
themes:
defaults: # theme used when --theme is omitted, per type
article: editorial
reel: editorial
portrait: default
article: # catalog nested by type → keyword unique within type
editorial:
palette: {teal: "#14534F", amber: "#E8923B", cream: "#F2EAD8", charcoal: "#282A2C"}
prompt: "Editorial conceptual illustration, flat screen-print / risograph..."
aspect: "16:9"
clay: { palette: {...}, prompt: "Isometric 3D clay-render...", aspect: "16:9" }
blueprint: { palette: {...}, prompt: "Technical blueprint / schematic...", aspect: "16:9" }
reel:
editorial: # same keyword as the article theme; type disambiguates
palette: {teal: "#14534F", amber: "#E8923B", cream: "#F2EAD8", charcoal: "#282A2C"}
card:
mode: overlay # full-bleed illustration + scrim + text (block also valid)
scrim: {from: 0.52, color: charcoal} # gradient over lower ~half
font_bold: DejaVuSans-Bold
font_mono: DejaVuSansMono-Bold
style: "Editorial conceptual illustration, flat screen-print / risograph, wordless..."
music: {prompt: "restrained editorial bed", gain: 0.16}
voice: {id: MhaH9hcD2Ulcr80j28Z1, stability: 0.6, similarity: 0.92}
clay: { palette: {...}, card: {mode: overlay, style: "clay-render, rounded panels..."}, music: {...}, voice: {...} }
blueprint: { palette: {...}, card: {mode: overlay, style: "blueprint grid, thin construction lines..."}, music: {...}, voice: {...} }
portrait:
default: { palette: {...}, prompt: "Stylised editorial avatar..." }
Nesting by type makes "unique within type" structural and lets editorial
name both an article and a reel theme without collision; keryx reel build
--theme editorial resolves themes.reel.editorial.
Naming convention: short lowercase kebab-case keyword, unique within its
type. The type is an explicit field and is usually implied by the command
consuming the theme, so the keyword itself does not encode the type —
editorial can name both an article theme and a reel theme, and
keryx cover --theme editorial vs keryx reel build --theme editorial
resolves by the command's type. Keep keywords descriptive of the look (clay, editorial,
blueprint), not the brand, unless you run multiple brands.
6.2 Seeded defaults (via keryx init)¶
keryx init seeds the catalog with the current house set, values identical
to the Python scripts (parity):
articlethemesclay,editorial,blueprint— the three verbatim cover-style prompt prefixes fromgen-cover.py.reelthemeseditorial(default),clay,blueprint— mirroring the article types, sharing the petrol-teal/amber/cream/charcoal palette and the voice cloneMhaH9hcD2Ulcr80j28Z1(stability 0.6 / similarity 0.92), but each with its own card treatment + illustrationstyle.editorialis the first reel's look (overlay illustrations + scrim).portraitthemedefault— the risograph avatar prompt.
init and config features stay enabled for this to work.
6.3 Theme command¶
keryx theme list [--type article|reel|portrait] # catalog, grouped by type
keryx theme show <keyword> [--type ...] # full definition
keryx theme add <keyword> --type <type> [--from <existing>] [--set k=v ...]
keryx theme edit <keyword> [--type ...] [--set k=v ...]
keryx theme rm <keyword> [--type ...]
Themes are read and written through the GTB config layer (pkg/config);
add/edit/rm persist to the user config file. --from clones an existing
theme as a starting point. Generators resolve a theme by --theme <keyword>
(falling back to themes.defaults.<type>) and never carry hardcoded
thematic constants.
7. Configuration & secrets¶
Config files (standardised). keryx reads, in precedence order (highest
first): CLI flags → env vars → project config .keryx.yaml (repo root) →
global config ~/.keryx/config.yaml → embedded defaults — the GTB
hierarchical config layer (pkg/config, Viper). The project file holds what
belongs to that project (themes §6, provider selection, platform enablement,
defaults); the global file is GTB's default config for cross-project
preferences. keryx init seeds .keryx.yaml. Secrets are never written to
these files — credentials stay in the keychain / env / CI variables (above).
The config is hot-reloadable: edits (by hand or via the studio Settings
panel, §10.1) are picked up live through GTB's Observable interface and
propagate to running components without a restart.
- Backend selection (§3.4):
providers.{image,video,voice,music,render}choose the adapter (defaults Gemini / ElevenLabs / ElevenLabs / ffmpeg), each with its own config block (endpoint, model, credentials). Switching backend is a config change, not a code change. - Large-file persistence (§3.5):
persistence.media.store(gitdefault /git-lfs/external/none) + the chosen store's config (LFS patterns, or object-store bucket/endpoint with creds via keychain). Global, per-project overridable. - Spend guard:
spend.confirm_above— per-capability thresholds past which a batch prompts for confirmation, global + per-project. Defaults:image_video_usd: 10(estimated $ of image+video generation per run) andvoice_chars: 50000(ElevenLabs is character-billed, so the voice guard is in its native unit — ≈ a dozen+ reels' narration, a clear runaway signal, not a cap on normal work). These prevent runaways, not generation (auto-yes under--yes/CI). - Cost rates:
providers.<x>.ratesseed the spend estimate; where a provider has a pricing/usage API, keryx refreshes them periodically (cached, configured rates as fallback) for best-effort accuracy. GEMINI_API_KEY,ELEVENLABS_API_TOKENfor the default adapters; other adapters carry their own keys. Credentials always via env / keychain / CI variables — never committed.- An LLM provider for the optional
keryx storyboard draft(GTB chat client — provider/key configurable; keryx runs fully without it). - Per-platform OAuth client id/secret + tokens (posting).
- Project config (
.keryx.yaml) and the reel workspaces live in the owning project's repo (here, the blog), not in keryx — keryx is stateless (§3.2); global config is~/.keryx/config.yaml. - Local: GTB keychain. CI: masked/protected variables + the writable store for refreshed tokens — all owned by the project running the pipeline.
- System dependencies. keryx shells to
ffmpeg/ffprobeand needs the card fonts (DejaVu bold/mono, or theme-configured).keryx doctor(the GTB default command, extended) verifies their presence and versions, that the configured provider credentials resolve, and that each enabled platform has a non-stale token — run first in CI and on first local use.
8. Testing & quality¶
keryx follows TDD and BDD, mirroring go-tool-base (CLAUDE.md). The
provider seams (§3.4) are what make this tractable: every external dependency is
behind an interface, so the deterministic core is unit-tested with no network,
no ffmpeg, and no API keys.
- Unit tests (TDD). Write failing tests first from the spec's behaviour and
edge cases. Table-driven with
t.Parallel(); mocks generated by mockery intomocks/. Newpkg/code targets ≥90% coverage. - Deterministic core, faked edges. Unit-test the logic directly: storyboard
parse/validation, the VO-driven timing maths (card
start_i, xfade offsets, total length), orphan-controlled wrapping, theme resolution, config-driven provider construction, the idempotency/social record, and social composition. The image/voice/music providers are faked behind their interfaces; the ffmpegRendereris faked for logic tests and run for real only in integration. - BDD with godog (Gherkin). User-facing workflows get
.featurefiles infeatures/, step definitions intest/e2e/steps/, driven by a dedicated e2e test binary (cmd/e2e) with all features enabled — the GTB pattern. New CLI commands and multi-step workflows must ship Gherkin scenarios. Priority scenarios: the authoring loop (reel new→storyboard draft→reel build --silent→voice --takes→select→reel build→social), per-line re-roll, card-illustration take selection + text-leak screen,post allagainst a faked platform (success, partial failure, idempotent retry,--dry-run), token refresh/rotation with write-back, and theme add/edit/resolve. - Rendering checks. Verify ffmpeg output by probing (duration ≈ Σ VO + leads/tails − xfades; dimensions 1080×1920; audio + video streams present) and optional golden-frame comparison of a rendered card — not brittle byte-equality on the MP4.
- Gating. Integration and e2e tests are env-var gated (
INT_TEST=1,INT_TEST_E2E=1, subsystem flags) — not build tags — for IDE discoverability, matching GTB. Tests needing ffmpeg/fonts or live APIs gate on a system-dependency / credentials check (doctor, §7). - Docs as you go (part of Done). A component/command is not done until its
docs are written/updated in the matching section (
docs/components/,docs/concepts/,docs/how-to/), cross-referenced with the code — maintained per component as the build progresses, never batched into a later pass. - Hygiene. No
//nolint— address root causes.just ci(tidy, generate, test, test-race, lint) must be green before any PR.
Definition of Done, every unit of work: failing test(s) first → minimal code → green
just ci(TDD); a.featurescenario for user-facing commands/workflows (BDD); and the component's docs page written/updated. All three, every time.
9. Roadmap¶
- Phase 0 — scaffold. GTB project generated, builds. ✅ (Correction: the
scaffold was generated with
mcpdisabled — re-generate withmcpin the feature set, §5, before building on it.) - Phase 1 — themes + provider seams + port reel-gen to Go. Land the config
theme model and
keryx theme/keryx initseeding (§6) first, so generators read theme from config from day one, and the provider interfaces (§3.4) so backends are config-selected from the start (Gemini / ElevenLabs / ffmpeg as the first adapters). Then voice + music (thin clients), reel assembly (cards + ffmpeg), then cover + portrait. Parity with the Python scripts; same storyboard, same seeded palette/styles. - Phase 1.5 — the authoring loop (§3.2). The reel workspace (
reel new), per-line/per-take addressability (voice --takes/voice select,reel build --only-line), the silent-draft proof,social, and the optionalstoryboard draft. This turns the parity scripts into the fast, bookkeeping-free iteration loop and is the prerequisite for an approved reel workspace that Phase 3 schedules can consume unattended. - Phase 2 — posting adapters. Publisher interface, then Instagram → YouTube
→ TikTok → LinkedIn.
post <platform>andpost all. - Phase 3 — auth, refresh, scheduling.
keryx auth+keryx auth refresh(rotate + GitLab write-back), failure alerting, and the GitLab scheduled pipelines that live in the owning project (the blog) runningkeryx post dueagainst committed, approved reel workspaces (§3.2, §4.3). - Phase 4 (stretch) — storyboard studio. The
keryx studioweb UI (§10): multi-project + reels library, the storyboard editor, per-platform social + approve/schedule/post, the Settings panel (edit.keryx.yaml+ hot reload), bundle association, and a chat-driven editor — all over the existing workspaces; no new pipeline. See §10.2 for the frontend-stack options. - Phase 5 (deferred) — video panels. Fill the video-shaped hole (§3.4):
uploaded-video panels first (no provider), then generated short video
(
VideoProvider, e.g. Gemini Omni) and the ffmpeg video-compositing path. Comes only once stills-based reels + posting are solid. Full spec:0003-video-panels.md. - Phase 6 (future / intent) — long-form video + in-browser capture.
Short-to-medium YouTube-style instructional video as a second artefact type,
and webcam/mic capture in the studio as a third media source. Not designed
yet — but the architecture leaves room (S3-first storage, open media-source +
voice-source enums, artefact-type neutrality). Gap captured in
0004-future-long-form-and-capture.md.
10. Future: storyboard studio (web UI)¶
Stretch / extended feature — Phase 4. A local web server, started from the
CLI (keryx studio), serving a small single-user web UI for managing a
project's reels and authoring each one. It is a richer front-end over the same
workspaces (§3.2, §5.2) — not a new pipeline: it reads and writes the same
files the CLI uses, so UI and CLI are interchangeable. Its functional
requirements, UX, and HTTP API contract are in
0002-interface-contracts.md §4.
10.1 What it does¶
- Switch projects (incl. remote) — a user has multiple projects, so the studio has a project switcher above the reels; a project may be a local dir or a remote git repo opened via GTB's git components (§3.5). The CLI, by contrast, always scopes to the current folder.
- Manage many reels (CRUD) — within a project, the studio opens on a reels library: list / create / open / duplicate / rename / delete, with each reel's status (draft → approved → posted) and its associated content at a glance. The single-reel editor is the drill-in.
- Compose the storyboard — add / reorder / edit cards (mode-adaptive: text, palette roles, accent words, scene, cover/mono flags); a live (approximate for overlay) card preview.
- Upload images — cover art and portrait reference photos into the workspace.
- Associate content (optional) — link the reel to a content directory (a page bundle or any directory, §3.2): records where the reel belongs and feeds the directory's text/assets to the chat AI as context. Or just paste source text.
- Chat to adjust — the GTB chat client proposes storyboard patches (text and structural ops); the human accepts/rejects — nothing auto-applies.
- Compose social elements per platform — generate and edit the
platform-appropriate supporting text, hashtags, links, and titles for
Instagram / YouTube / TikTok / LinkedIn, with the UI steering to each
platform's conventions and limits (caption length, hashtag norms, link
handling). Pairs with
keryx social/PostMeta(§4.3, §4.1). - Approve, schedule & post on demand — per platform: set the approval gate, an optional scheduled date/time, see the status (draft → approved → posted), and post now (on-demand, human-initiated, requires approved). Unattended posting still happens only in CI (§4.3). Persistence is git (§3.5): saving/approving commits to the project repo.
- Project settings — a Settings panel to edit the project's config (themes,
AI providers, platform enablement, defaults). It writes the project's
.keryx.yaml(§7) and relies on GTB config hot reload so changes propagate live; secrets are never written there (keychain/env only). - Natural extensions (v2): preview the silent draft, audition voice/music/card takes, trigger a render — all by calling the existing generators/commands.
10.2 Shape¶
- New command
keryx studio [--port N] [--workspace <slug>]; a GTB default-disabled feature, enabled for this tool. - Built on GTB service lifecycle and transport —
pkg/controlsfor start / health / graceful-shutdown,pkg/httpfor the server,pkg/chatfor the chat panel. The frontend is embedded viago:embedso the binary stays a single file. - Frontend stack (considered). Two viable shapes:
- Server-rendered Go + HTMX (recommended).
templcomponents rendered bypkg/http, HTMX for interactivity (card add/edit/reorder), and SSE for streaming the chat panel. No separate JS build/toolchain, all Go, trivially embedded — the best fit for a small single-user local tool, and it keeps the one-language/one-binary property GTB is built around. - Embedded SPA (Svelte or Preact). A small JS app built to static assets
and embedded, talking to a thin keryx JSON + SSE API. Buys richer client
interactivity (smooth drag-reorder, live preview) at the cost of a Node
build step in CI.
Start with HTMX +
templ; reach for an SPA only if the interaction model outgrows it. Avoid a heavyweight React/Next stack for a localhost utility. - Mobile-first & responsive (important). The primary device in practice is a
phone, so the studio is designed mobile-first: a single-column, tabbed
layout (Cards / Editor / Chat / Source) on narrow screens — with chat as the
primary authoring surface on mobile — expanding to the three-pane desktop
layout with collapsible card-list and chat panes on wide screens. HTMX +
templ+ responsive CSS covers this; SSE chat works on mobile. Visual language is minimalist — a neutral light base with teal/amber as restrained accents, not blocks of brand colour. Full functional/UX requirements + the mobile layout are in0002-interface-contracts.md§4. - Local, single-user, localhost-bound by default (matches the non-goals: not multi-tenant). It spends API credits (generation, chat) and edits the workspace, so it is not exposed publicly without explicit opt-in and the GTB HTTP server's auth.
- Not a video timeline editor (§2 non-goals) — it drafts the storyboard; rendering stays the deterministic pipeline.
11. Platform setup prerequisites (the long poles — start early)¶
- Instagram: phpboyscout IG as a Professional (Business/Creator) account; a Meta developer app (Standard Access is enough for own-account).
- YouTube: a Google Cloud project; OAuth consent screen published "In production"; apply for the upload audit + CASA (else uploads lock private).
- TikTok: a TikTok developer app; submit for the Content Posting audit (private-only until it passes — weeks).
- LinkedIn: a developer app; personal posting self-serve; org/refresh-token needs MDP partner (company registration).
12. References¶
- Interface contracts (CLI + web UI), the test basis:
0002-interface-contracts.md. - Video panels (deferred feature spec):
0003-video-panels.md. - Long-form video + capture (future directions):
0004-future-long-form-and-capture.md. - Blog Python reference:
~/workspace/phpboyscout/blog/scripts/(+README.md). - Blog reel policy: blog
CLAUDE.md, "Promo reels: a reel per post". - Platform docs: Instagram Content Publishing
(developers.facebook.com/docs/instagram-platform/content-publishing/),
YouTube
videos.insert(developers.google.com/youtube/v3/docs/videos/insert), TikTok Content Posting (developers.tiktok.com/doc/content-posting-api-get-started), LinkedIn Videos/Posts API (learn.microsoft.com/linkedin/marketing/community-management).