Providers (pluggable backends)¶

Every external backend keryx uses — image, video, voice and music generation, and rendering — sits behind a narrow Go interface, with the concrete implementation chosen from user config at construction. Swapping a backend, or adding a new one, is a config change plus an additive adapter package — never a call-site change (design spec §3.4).

The seams¶

Capability	Interface	Default adapter	Config key
Image generation	`ImageProvider.Generate`	Gemini / Imagen	`providers.image`
Video generation	`VideoProvider.Generate`	Gemini (Omni)	`providers.video`
Voice / TTS	`VoiceProvider.Synthesize`	ElevenLabs	`providers.voice`
Music	`MusicProvider.Compose`	ElevenLabs Music	`providers.music`
Rendering	`Renderer.Render`	local ffmpeg	`providers.render`

Interfaces and the provider-neutral request/response types live in pkg/provider; mocks are generated into mocks/pkg/provider for tests.

Render backends¶

Two Renderer adapters ship, selected with providers.render:

`providers.render`	How it renders
`ffmpeg` (default)	shells out to a system `ffmpeg` binary
`afmpeg`	FFmpeg compiled to WebAssembly (the `ffmpeg-wasi` engine), driven via the `afmpeg` library — the whole reel renders in memory, no system `ffmpeg`

The afmpeg backend needs an ffmpeg-wasi module. By default it needs no configuration — it downloads a pinned published module (the gpl variant, by URL + SHA-256), caches it under the OS user-cache dir, and reuses it thereafter. So providers.render = afmpeg just works.

To override the module — a locally-built .wasm, a different release, or an air-gapped mirror — set providers.render.module:

Key	Value
`providers.render.module`	a host path to a `.wasm`, or an `https://` URL (a `.gz` URL is decompressed on download)
`providers.render.module_sha256`	hex SHA-256 to verify a URL module against (the decompressed `.wasm`) — strongly recommended, it is executable code
`providers.render.cache_dir`	override where a downloaded module is cached

$KERYX_FFMPEG_WASI is an env fallback for providers.render.module (path or URL).

lgpl vs gpl. ffmpeg-wasi ships two variants: gpl carries libx264 and can encode H.264 (what reels need — keryx pins the gpl module by default); lgpl decodes H.264 but cannot encode it. Choosing the gpl module accepts GPL terms for that artefact.

The reel logic (still-loop, xfade-concat, audio mix) stays in keryx; afmpeg/ffmpeg-wasi remain generic tools.

Render works over any filesystem (spec 0021). The render core reads inputs from and writes the mp4 into an afero.Fs — so a project that lives in memory (the studio's in-memory/RAM-worktree remotes) renders too, not just an on-disk checkout. afmpeg renders any fs natively; the shell-out ffmpeg (its binary needs real files) transparently materialises a non-OS fs to a temp dir and copies the result back. The studio's former "render is local-only" gate is gone.

Provider-neutral requests¶

Requests carry keryx's intent, not vendor payloads — a prompt + aspect for an image, the narration text + clone settings for voice, a Timeline of segments + an audio mix for the renderer. Each adapter maps that intent to its own API and back, so call sites never mention a vendor. Provider-specific identifiers (an ElevenLabs voice id, a Gemini model name) live in the theme / provider config the active adapter understands.

Config-driven construction¶

A per-capability Factory[T] resolves providers.<capability> to a registered constructor and builds it from that provider's config block (endpoint, model, credentials via keychain/env — never committed). A blank or unset value falls back to the default adapter; an unknown value is an error listing what's available.

// call site — never names a vendor
voice, err := provider.VoiceFactory.Resolve(cfg)
audio, err := voice.Synthesize(ctx, provider.VoiceRequest{Text: line, VoiceID: id})

Adding an adapter is purely additive — implement the interface and register a constructor from the adapter package's init():

func init() {
    provider.VoiceFactory.Register("openai", newOpenAIVoice)
}

…then providers.voice: openai selects it. No other code changes.

Provider config blocks also carry adapter settings. For Gemini, providers.gemini.model forces a specific image model id (else the adapter tries its built-in Imagen→Gemini fallback chain); a per-run --model on cover/portrait overrides it. The model is routed by id prefix — imagen* uses the Imagen :predict endpoint, others use :generateContent.

Credentials & config from the environment¶

Generation API keys are read from the environment, never from config (so they're never committed):

Provider	Env var
Gemini (image / cards / chat draft)	`GEMINI_API_KEY`
ElevenLabs (voice + music)	`ELEVENLABS_API_TOKEN`
Anthropic (chat)	`ANTHROPIC_API_KEY`
OpenAI (chat)	`OPENAI_API_KEY`

Any config key can also be overridden by an environment variable. The variable name is the config key upper-cased with . → _, and no tool prefix — so:

Config key	Environment variable
`providers.chat.provider`	`PROVIDERS_CHAT_PROVIDER`
`providers.image`	`PROVIDERS_IMAGE`
`studio.take_count`	`STUDIO_TAKE_COUNT`

keryx sets no env prefix, so it's PROVIDERS_CHAT_PROVIDER, not KERYX_PROVIDERS_CHAT_PROVIDER. The env value wins over the config file.

Where the config file is read from¶

Non-secret config (themes, provider choice, defaults) is read from two layers and deep-merged, project over global:

Layer	File	Scope
Global	`~/.keryx/config.yaml`	per-user, seeded by `keryx init`
Project	`.keryx.yaml` at the repo root	committed with the owning project; overrides the global

The project .keryx.yaml is discovered by walking up from the working directory, so keryx run from anywhere inside the repo picks it up. This is what lets config live in the owning project (the blog brings its own themes + provider choices). Full precedence, highest first: flags → env var → project .keryx.yaml → global config.yaml → built-in defaults.

Testability¶

Because every backend is an interface, the deterministic core is unit-tested with no network, no ffmpeg, and no API keys — fakes (or the generated mocks) stand in. This is what keeps the timing maths, wrapping, theme/provider resolution, and the posting ledger testable in isolation (spec §8).

Status: the seams, the registry, and the Gemini image / ElevenLabs voice+music / ffmpeg-render adapters are implemented and drive the working pipeline. Video generation is deferred to a later phase.