Skip to content

Speech-to-Text Benchmark

12 providers · independently tested · April 2026

ProviderWERNoisy WERLatencyCost/minLanguagesSource
Deepgram Nova-3
7.8%10.4%2.4s$0.004336Deepgram
ElevenLabs Scribe v2 Realtime
3.5%5.1%1.0s$0.006799ElevenLabs API
ElevenLabs Scribe v1
5.4%6.9%1.1s$0.006799ElevenLabs API
Alibaba qwen3-asr-flash
3.5%5.4%0.6s$0.002190Alibaba DashScope Qwen3-ASR-Flash
OpenAI gpt-4o-transcribe
19.4%33.7%1.9s$0.006050OpenAI API pricing
OpenAI gpt-4o-mini-transcribe
8.2%17.1%2.0s$0.003050OpenAI API pricing
OpenAI whisper-1
11.6%12.3%2.6s$0.006057OpenAI API pricing
xAI Grok STT
16.75%16.1%0.9s$0.001725xAI Grok STT and TTS APIs announcement
AssemblyAI Universal-2
6.22%9.5%3.7s$0.006299AssemblyAI
AssemblyAI Universal-3 Pro
5.06%7.9%4.2s$0.00676AssemblyAI
Google Cloud Chirp 2
5.37%17.9%4.5s$0.0240125Google Cloud Speech-to-Text v2
Google Gemini 2.5 Flash (STT)
6.03%18.9%2.5s$0.0002100Gemini API

Noise Robustness

How accuracy holds up under pressure.

Real-world audio is noisy. We tested each provider across 5 conditions to see how much accuracy degrades.

Language
Deepgram Nova-3ElevenLabs Scribe v2 RealtimeElevenLabs Scribe v1Alibaba qwen3-asr-flashOpenAI gpt-4o-transcribeOpenAI gpt-4o-mini-transcribeOpenAI whisper-1xAI Grok STTAssemblyAI Universal-2AssemblyAI Universal-3 ProGoogle Cloud Chirp 2Google Gemini 2.5 Flash (STT)

Methodology

165

API calls

58

Minutes tested

$0.89

Total cost

3

Providers benchmarked

STT accuracy, latency, noise robustness, and multi-language data tested by Speko Bench CLI (speko-bench-cli v0.1.0) using LibriSpeech test-clean + Google FLEURS datasets. Pricing from official provider pages.

Stop guessing. Start benchmarking.

Independent, data-driven comparisons to help you pick the right voice AI stack.

Get Started