Why it fits
Everything a voice pipeline needs
Speech-to-text, text-to-speech, and the LLM in between — all on one platform.
Speech-to-text
Transcribe audio with Whisper Large v3 or Qwen3-ASR via /v1/audio/transcriptions, billed per minute.
Text-to-speech
Synthesize natural speech with Kokoro-82M or Qwen3-TTS via /v1/audio/speech, with selectable voices.
Full voice loop
Chain speech-to-text → an open LLM → text-to-speech behind one API key for end-to-end voice assistants.
Multilingual
Transcribe and speak across many languages.
OpenAI-compatible audio API
Drop-in /v1/audio endpoints — reuse the OpenAI client you already have.
Multi-region
Requests route to a healthy region automatically.
Quickstart
Transcribe and synthesize
Standard OpenAI audio endpoints — only the model names change.
# transcribe, then speak the result back
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=open("call.mp3", "rb"),
)
speech = client.audio.speech.create(
model="kokoro-82m",
voice="af_sky",
input=transcript.text,
)