Providers (pluggable backends)¶
Every external backend keryx uses — image, video, voice and music generation, and rendering — sits behind a narrow Go interface, with the concrete implementation chosen from user config at construction. Swapping a backend, or adding a new one, is a config change plus an additive adapter package — never a call-site change (design spec §3.4).
The seams¶
| Capability | Interface | Default adapter | Config key |
|---|---|---|---|
| Image generation | ImageProvider.Generate |
Gemini / Imagen | providers.image |
| Video generation | VideoProvider.Generate |
Gemini (Omni) | providers.video |
| Voice / TTS | VoiceProvider.Synthesize |
ElevenLabs | providers.voice |
| Music | MusicProvider.Compose |
ElevenLabs Music | providers.music |
| Rendering | Renderer.Render |
local ffmpeg | providers.render |
Interfaces and the provider-neutral request/response types live in
pkg/provider;
mocks are generated into mocks/pkg/provider for tests.
Provider-neutral requests¶
Requests carry keryx's intent, not vendor payloads — a prompt + aspect for an
image, the narration text + clone settings for voice, a Timeline of segments +
an audio mix for the renderer. Each adapter maps that intent to its own API and
back, so call sites never mention a vendor. Provider-specific identifiers (an
ElevenLabs voice id, a Gemini model name) live in the theme / provider config the
active adapter understands.
Config-driven construction¶
A per-capability Factory[T] resolves providers.<capability> to a registered
constructor and builds it from that provider's config block (endpoint, model,
credentials via keychain/env — never committed). A blank or unset value falls
back to the default adapter; an unknown value is an error listing what's
available.
// call site — never names a vendor
voice, err := provider.VoiceFactory.Resolve(cfg)
audio, err := voice.Synthesize(ctx, provider.VoiceRequest{Text: line, VoiceID: id})
Adding an adapter is purely additive — implement the interface and register a
constructor from the adapter package's init():
…then providers.voice: openai selects it. No other code changes.
Testability¶
Because every backend is an interface, the deterministic core is unit-tested with no network, no ffmpeg, and no API keys — fakes (or the generated mocks) stand in. This is what keeps the timing maths, wrapping, theme/provider resolution, and the posting ledger testable in isolation (spec §8).
Status: the seams, the registry, and the request/response types land in Phase 1b. The concrete adapters (Gemini image, ElevenLabs voice/music, ffmpeg renderer) register against these factories in later Phase-1 MRs; video is deferred to Phase 5.