Voice providers — bring your own STT/TTS

Let Chamade run hosted STT (transcripts) and hosted TTS (chamade_call_say) on your meetings using your own API keys. Configure once in the dashboard; every new call picks it up automatically.

What voice providers do

A voice gateway like Chamade bridges a meeting's audio to your agent. On each call you can either:

The two modes are independent per side: hosted STT + BYO TTS is fine, and vice versa. No preset for a given side = that side stays BYO-audio on the WebSocket.

Add a preset

Go to your dashboardVoice providers (STT / TTS), click + Add on the STT or TTS column:

  1. Pick a provider.
  2. Paste your API key. Chamade stores it Fernet-encrypted and never returns it over the API.
  3. Click Save. Chamade probes the key against the provider's auth endpoint; bad keys are rejected immediately.

Every other option (model, voice, language, stability, …) has a sensible default — you only need to open Advanced options if you want to customize. For TTS, use the Load voices from provider button to pick a specific voice ID from the provider's catalog.

Enabled = the one being used

At most one preset per kind (STT, TTS) can be enabled at a time. Enabling a new preset disables the previous automatically. Disabled presets keep their data (key, options) — useful to quickly swap between a "cheap" Deepgram and a "premium" ElevenLabs without re-entering keys.

No enabled preset for a given kind = that side stays BYO-audio on the next call.

Capabilities on each call

When you create a call with POST /api/call (or chamade_call_join), the response capabilities array advertises hosted_stt / hosted_tts if a preset is enabled for that kind. Your agent can branch on that:

"capabilities": ["audio_in", "audio_out", "read", "write", "typing", "hosted_stt", "hosted_tts"]

Per-call override

If you keep multiple presets around (say, a premium and a cheap TTS) and you want to force a specific one for one call, pass the preset UUID at creation time:

POST /api/call
{
  "platform": "teams",
  "meeting_url": "...",
  "tts_voice_config_id": "<uuid-of-a-TTS-preset>",
  "stt_voice_config_id": "<uuid-of-an-STT-preset>"
}

The UUID comes from GET /api/voice-configs. Trying to use a disabled preset returns 400 — enable it in the dashboard first.

Supported providers

ProviderSTTTTSNotes
ElevenLabs ✓ (Scribe, HTTP-segmented — bills only on real speech) ✓ (WebSocket streaming — voice catalog via Load voices) Default voice: Rachel. Model: eleven_turbo_v2_5.
Deepgram ✓ (Nova-3 WebSocket — smart_format, diarize, interim results) ✓ (Aura-2 WebSocket) Default STT model: nova-3. Default Aura voice: thalia-en.
Cartesia ✓ (Sonic WebSocket — sub-100ms transcripts) ✓ (Sonic WebSocket — sub-100ms TTFB, voice catalog via Load voices) Default TTS model: sonic-3. Pick a voice id in your Cartesia account.
AssemblyAI ✓ (Universal-Streaming WebSocket — turn-formatted finals, optional diarization) Defaults: format_turns on, speaker_labels off.
OpenAI ✓ (Whisper / gpt-4o-transcribe — HTTP batch, not streaming) ✓ (gpt-4o-mini-tts default; tts-1 / tts-1-hd available) OpenAI TTS prefers 24 kHz output; resampling handled automatically but expect a small quality hit on 8/16 kHz platforms (SIP, Telegram).

More providers (Azure, Google, Speechmatics, Gladia, …) can be added on request — drop a note to [email protected] if you need a specific one.

REST API

The dashboard is a thin wrapper over these endpoints — call them directly from your own tooling if needed:

MethodPathPurpose
GET/api/voice-configs/schemaProvider catalog + option fields
GET/api/voice-configsList your presets (key never returned)
POST/api/voice-configsCreate a preset (probed on save)
PATCH/api/voice-configs/{id}Update — enabled=true re-probes the key, disables sibling
DELETE/api/voice-configs/{id}Delete a preset
POST/api/providers/{name}/voicesProxy the provider's voice catalog (TTS picker)