Skip to content

Providers (pluggable backends)

Every external backend keryx uses — image, video, voice and music generation, and rendering — sits behind a narrow Go interface, with the concrete implementation chosen from user config at construction. Swapping a backend, or adding a new one, is a config change plus an additive adapter package — never a call-site change (design spec §3.4).

The seams

Capability Interface Default adapter Config key
Image generation ImageProvider.Generate Gemini / Imagen providers.image
Video generation VideoProvider.Generate Gemini (Omni) providers.video
Voice / TTS VoiceProvider.Synthesize ElevenLabs providers.voice
Music MusicProvider.Compose ElevenLabs Music providers.music
Rendering Renderer.Render local ffmpeg providers.render

Interfaces and the provider-neutral request/response types live in pkg/provider; mocks are generated into mocks/pkg/provider for tests.

Provider-neutral requests

Requests carry keryx's intent, not vendor payloads — a prompt + aspect for an image, the narration text + clone settings for voice, a Timeline of segments + an audio mix for the renderer. Each adapter maps that intent to its own API and back, so call sites never mention a vendor. Provider-specific identifiers (an ElevenLabs voice id, a Gemini model name) live in the theme / provider config the active adapter understands.

Config-driven construction

A per-capability Factory[T] resolves providers.<capability> to a registered constructor and builds it from that provider's config block (endpoint, model, credentials via keychain/env — never committed). A blank or unset value falls back to the default adapter; an unknown value is an error listing what's available.

// call site — never names a vendor
voice, err := provider.VoiceFactory.Resolve(cfg)
audio, err := voice.Synthesize(ctx, provider.VoiceRequest{Text: line, VoiceID: id})

Adding an adapter is purely additive — implement the interface and register a constructor from the adapter package's init():

func init() {
    provider.VoiceFactory.Register("openai", newOpenAIVoice)
}

…then providers.voice: openai selects it. No other code changes.

Testability

Because every backend is an interface, the deterministic core is unit-tested with no network, no ffmpeg, and no API keys — fakes (or the generated mocks) stand in. This is what keeps the timing maths, wrapping, theme/provider resolution, and the posting ledger testable in isolation (spec §8).

Status: the seams, the registry, and the request/response types land in Phase 1b. The concrete adapters (Gemini image, ElevenLabs voice/music, ffmpeg renderer) register against these factories in later Phase-1 MRs; video is deferred to Phase 5.