Voice providers — bring your own STT/TTS

Let Chamade run hosted STT (transcripts) and hosted TTS (chamade_call_say) on your meetings using your own API keys. Configure once in the dashboard; every new call picks it up automatically.

What voice providers do

A voice gateway like Chamade bridges a meeting's audio to your agent. On each call you can either:

BYO-audio — you receive raw PCM on the call's audio WebSocket and run your own STT/TTS locally (OpenAI Realtime, LiveKit, Pipecat, whatever).
Hosted STT/TTS — you add an API key for a provider once in the dashboard. Chamade handles transcription and speech synthesis on its side using your key: transcripts arrive as call_transcript events, chamade_call_say speaks text in the meeting.

The two modes are independent per side: hosted STT + BYO TTS is fine, and vice versa. No preset for a given side = that side stays BYO-audio on the WebSocket.

Add a preset

Go to your dashboard → Voice providers (STT / TTS), click + Add on the STT or TTS column:

Pick a provider.
Paste your API key. Chamade stores it Fernet-encrypted and never returns it over the API.
Click Save. Chamade probes the key against the provider's auth endpoint; bad keys are rejected immediately.

Every other option (model, voice, language, stability, …) has a sensible default — you only need to open Advanced options if you want to customize. For TTS, use the Load voices from provider button to pick a specific voice ID from the provider's catalog.

Enabled = the one being used

At most one preset per kind (STT, TTS) can be enabled at a time. Enabling a new preset disables the previous automatically. Disabled presets keep their data (key, options) — useful to quickly swap between a "cheap" Deepgram and a "premium" ElevenLabs without re-entering keys.

No enabled preset for a given kind = that side stays BYO-audio on the next call.

Capabilities on each call

When you create a call with POST /api/call (or chamade_call_join), the response capabilities array advertises hosted_stt / hosted_tts if a preset is enabled for that kind. Your agent can branch on that:

"capabilities": ["audio_in", "audio_out", "read", "write", "typing", "hosted_stt", "hosted_tts"]

hosted_stt present → call_transcript events will arrive on the WS (and in chamade_call_status).
hosted_tts present → chamade_call_say works; otherwise it returns 400 and you should synthesize locally + send PCM on the audio WS.

Per-call override

If you keep multiple presets around (say, a premium and a cheap TTS) and you want to force a specific one for one call, pass the preset UUID at creation time:

POST /api/call
{
  "platform": "teams",
  "meeting_url": "...",
  "tts_voice_config_id": "<uuid-of-a-TTS-preset>",
  "stt_voice_config_id": "<uuid-of-an-STT-preset>"
}

The UUID comes from GET /api/voice-configs. Trying to use a disabled preset returns 400 — enable it in the dashboard first.

Supported providers

Provider	STT	TTS	Notes
ElevenLabs	✓ (Scribe, HTTP-segmented — bills only on real speech)	✓ (WebSocket streaming — voice catalog via Load voices)	Default voice: Rachel. Model: eleven_turbo_v2_5.
Deepgram	✓ (Nova-3 WebSocket — smart_format, diarize, interim results)	✓ (Aura-2 WebSocket)	Default STT model: nova-3. Default Aura voice: thalia-en.
Cartesia	✓ (Sonic WebSocket — sub-100ms transcripts)	✓ (Sonic WebSocket — sub-100ms TTFB, voice catalog via Load voices)	Default TTS model: sonic-3. Pick a voice id in your Cartesia account.
AssemblyAI	✓ (Universal-Streaming WebSocket — turn-formatted finals, optional diarization)	—	Defaults: format_turns on, speaker_labels off.
OpenAI	✓ (Whisper / gpt-4o-transcribe — HTTP batch, not streaming)	✓ (gpt-4o-mini-tts default; tts-1 / tts-1-hd available)	OpenAI TTS prefers 24 kHz output; resampling handled automatically but expect a small quality hit on 8/16 kHz platforms (SIP, Telegram).

More providers (Azure, Google, Speechmatics, Gladia, …) can be added on request — drop a note to [email protected] if you need a specific one.

REST API

The dashboard is a thin wrapper over these endpoints — call them directly from your own tooling if needed:

Method	Path	Purpose
GET	`/api/voice-configs/schema`	Provider catalog + option fields
GET	`/api/voice-configs`	List your presets (key never returned)
POST	`/api/voice-configs`	Create a preset (probed on save)
PATCH	`/api/voice-configs/{id}`	Update — `enabled=true` re-probes the key, disables sibling
DELETE	`/api/voice-configs/{id}`	Delete a preset
POST	`/api/providers/{name}/voices`	Proxy the provider's voice catalog (TTS picker)