Best speech-to-text API, by language
English leaderboards do not transfer. We benchmark STT providers per language on FLEURS read speech, identically routed through the Speko gateway, and publish the measured numbers. On the 2026-06-03 run, 2 different providers win across 4 languages - which is exactly why routing per language beats picking one vendor.
English
OpenAI GPT-4o Transcribe: 2.4% WERFull English leaderboard: accuracy, latency, list price, and which providers do not support it.
Thai
ElevenLabs Scribe v2: 4.1% CERFull Thai leaderboard: accuracy, latency, list price, and which providers do not support it.
Indonesian
OpenAI GPT-4o Transcribe: 2.4% WERFull Indonesian leaderboard: accuracy, latency, list price, and which providers do not support it.
Vietnamese
ElevenLabs Scribe v2: 1.9% WERFull Vietnamese leaderboard: accuracy, latency, list price, and which providers do not support it.
Methodology in one line
FLEURS read-speech clips per language, loudness-normalized to -16 LUFS, every provider measured through the same gateway, scored as WER (CER for Thai, which has no word boundaries). Full methodology and interactive tables live at benchmarks.speko.ai.