Voice providers — bring your own STT/TTS
Let Chamade run hosted STT (transcripts) and hosted TTS (chamade_call_say) on your meetings using your own API keys. Configure once in the dashboard; every new call picks it up automatically.
What voice providers do
A voice gateway like Chamade bridges a meeting's audio to your agent. On each call you can either:
- BYO-audio — you receive raw PCM on the call's audio WebSocket and run your own STT/TTS locally (OpenAI Realtime, LiveKit, Pipecat, whatever).
- Hosted STT/TTS — you add an API key for a provider once in the dashboard. Chamade handles transcription and speech synthesis on its side using your key: transcripts arrive as
call_transcriptevents,chamade_call_sayspeaks text in the meeting.
The two modes are independent per side: hosted STT + BYO TTS is fine, and vice versa. No preset for a given side = that side stays BYO-audio on the WebSocket.
Add a preset
Go to your dashboard → Voice providers (STT / TTS), click + Add on the STT or TTS column:
- Pick a provider.
- Paste your API key. Chamade stores it Fernet-encrypted and never returns it over the API.
- Click Save. Chamade probes the key against the provider's auth endpoint; bad keys are rejected immediately.
Every other option (model, voice, language, stability, …) has a sensible default — you only need to open Advanced options if you want to customize. For TTS, use the Load voices from provider button to pick a specific voice ID from the provider's catalog.
Enabled = the one being used
At most one preset per kind (STT, TTS) can be enabled at a time. Enabling a new preset disables the previous automatically. Disabled presets keep their data (key, options) — useful to quickly swap between a "cheap" Deepgram and a "premium" ElevenLabs without re-entering keys.
No enabled preset for a given kind = that side stays BYO-audio on the next call.
Capabilities on each call
When you create a call with POST /api/call (or chamade_call_join), the response capabilities array advertises hosted_stt / hosted_tts if a preset is enabled for that kind. Your agent can branch on that:
"capabilities": ["audio_in", "audio_out", "read", "write", "typing", "hosted_stt", "hosted_tts"]
hosted_sttpresent →call_transcriptevents will arrive on the WS (and inchamade_call_status).hosted_ttspresent →chamade_call_sayworks; otherwise it returns 400 and you should synthesize locally + send PCM on the audio WS.
Per-call override
If you keep multiple presets around (say, a premium and a cheap TTS) and you want to force a specific one for one call, pass the preset UUID at creation time:
POST /api/call
{
"platform": "teams",
"meeting_url": "...",
"tts_voice_config_id": "<uuid-of-a-TTS-preset>",
"stt_voice_config_id": "<uuid-of-an-STT-preset>"
}
The UUID comes from GET /api/voice-configs. Trying to use a disabled preset returns 400 — enable it in the dashboard first.
Supported providers
| Provider | STT | TTS | Notes |
|---|---|---|---|
| ElevenLabs | ✓ (Scribe, HTTP-segmented — bills only on real speech) | ✓ (WebSocket streaming — voice catalog via Load voices) | Default voice: Rachel. Model: eleven_turbo_v2_5. |
| Deepgram | ✓ (Nova-3 WebSocket — smart_format, diarize, interim results) | ✓ (Aura-2 WebSocket) | Default STT model: nova-3. Default Aura voice: thalia-en. |
| Cartesia | ✓ (Sonic WebSocket — sub-100ms transcripts) | ✓ (Sonic WebSocket — sub-100ms TTFB, voice catalog via Load voices) | Default TTS model: sonic-3. Pick a voice id in your Cartesia account. |
| AssemblyAI | ✓ (Universal-Streaming WebSocket — turn-formatted finals, optional diarization) | — | Defaults: format_turns on, speaker_labels off. |
| OpenAI | ✓ (Whisper / gpt-4o-transcribe — HTTP batch, not streaming) | ✓ (gpt-4o-mini-tts default; tts-1 / tts-1-hd available) | OpenAI TTS prefers 24 kHz output; resampling handled automatically but expect a small quality hit on 8/16 kHz platforms (SIP, Telegram). |
More providers (Azure, Google, Speechmatics, Gladia, …) can be added on request — drop a note to [email protected] if you need a specific one.
REST API
The dashboard is a thin wrapper over these endpoints — call them directly from your own tooling if needed:
| Method | Path | Purpose |
|---|---|---|
| GET | /api/voice-configs/schema | Provider catalog + option fields |
| GET | /api/voice-configs | List your presets (key never returned) |
| POST | /api/voice-configs | Create a preset (probed on save) |
| PATCH | /api/voice-configs/{id} | Update — enabled=true re-probes the key, disables sibling |
| DELETE | /api/voice-configs/{id} | Delete a preset |
| POST | /api/providers/{name}/voices | Proxy the provider's voice catalog (TTS picker) |
