Skip to main content

Use case · Voice & Speech

Build voice experiences, speech in, speech out

Transcribe audio and synthesize natural speech with open models — and chain them into complete voice pipelines behind a single API key.

Why it fits

Everything a voice pipeline needs

Speech-to-text, text-to-speech, and the LLM in between — all on one platform.

Speech-to-text

Transcribe audio with Whisper Large v3 or Qwen3-ASR via /v1/audio/transcriptions, billed per minute.

Text-to-speech

Synthesize natural speech with Kokoro-82M or Qwen3-TTS via /v1/audio/speech, with selectable voices.

Full voice loop

Chain speech-to-text → an open LLM → text-to-speech behind one API key for end-to-end voice assistants.

Multilingual

Transcribe and speak across many languages.

OpenAI-compatible audio API

Drop-in /v1/audio endpoints — reuse the OpenAI client you already have.

Multi-region

Requests route to a healthy region automatically.

Quickstart

Transcribe and synthesize

Standard OpenAI audio endpoints — only the model names change.

# transcribe, then speak the result back
transcript = client.audio.transcriptions.create(
    model="whisper-large-v3",
    file=open("call.mp3", "rb"),
)

speech = client.audio.speech.create(
    model="kokoro-82m",
    voice="af_sky",
    input=transcript.text,
)

Give your app a voice

Start with open speech models and one API key.