Best Thai Speech-to-Text API (2026): independent benchmark
Thai speech-to-text accuracy, measured. 30 Thai read-speech clips from FLEURS, every provider routed through the same gateway and scored identically on 2026-06-03. Lower CER is better.
ElevenLabs Scribe v2 posted the lowest Thai CER at 4.1%, ahead of Alibaba Qwen3-ASR at 4.8%. 2 of the 6 providers on our English board do not support Thai at all, and the supported field spreads from 4.1% to 8.1% - picking a provider by its English score alone is a mistake.
Thai STT leaderboard
30 Thai clips (FLEURS), measured 2026-06-03 through the Speko gateway, loudness-normalized to -16 LUFS. CER measured on Thai; latency and list price from the same gateway setup on the English board. Lower is better.
| Provider / model | CER | p50 latency | List price |
|---|---|---|---|
| ElevenLabs Scribe v2 | 4.1% | 1,353 ms | $0.0067/min |
| Alibaba Qwen3-ASR | 4.8% | 2,195 ms | - |
| xAI Grok STT | 6.6% | 996 ms | - |
| OpenAI GPT-4o Transcribe | 8.1% | 1,084 ms | $0.006/min |
| Cartesia Ink-2 | does not support | - | - |
| Gradium | does not support | - | - |
How we measured
- Dataset: 30 Thai read-speech clips from FLEURS (30 clips per language across the wedge).
- Scoring: mean character error rate (CER), lower is better. Audio is loudness-normalized to -16 LUFS before scoring so input-gain handling does not contaminate the accuracy column.
- Every provider is measured the same way: through the Speko gateway (POST /v1/transcribe, provider pinned), from a single location.
- Latency and list price columns come from the same gateway setup measured on the English board (n=50); the CER column is measured on Thai audio.
- Run date: 2026-06-03.
Full interactive table, every territory, and the complete methodology: benchmarks.speko.ai
Use the winner without lock-in
The best Thai provider today is one benchmark run away from being second best. Speko is one API in front of every provider on this table: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. No per-vendor integration, no migration when the leaderboard flips.
curl -X POST https://api.speko.dev/v1/transcribe \
-H "Authorization: Bearer $SPEKO_API_KEY" \
-H "Content-Type: audio/wav" \
-H "x-speko-intent: {\"language\":\"th\"}" \
--data-binary @call.wav import { Speko } from '@spekoai/sdk';
import { readFile } from 'node:fs/promises';
const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });
const audio = await readFile('./call.wav');
const { text, provider, confidence } = await speko.transcribe(audio, {
language: 'th',
}); FAQ
What is the most accurate Thai speech-to-text API?
On Speko's 2026-06-03 FLEURS benchmark (30 Thai clips), ElevenLabs Scribe v2 posted the lowest CER at 4.1%, followed by Alibaba Qwen3-ASR at 4.8%.
Does ElevenLabs support Thai speech-to-text?
Yes. ElevenLabs Scribe v2 scored 4.1% CER on our Thai run, the best result on the board.
Why is Thai scored with CER instead of WER?
Thai script does not put spaces between words, so "word error rate" depends on an arbitrary segmenter. Character error rate avoids that, which is why Thai STT benchmarks (including ours) report CER.
Which providers do not support Thai transcription?
Cartesia Ink-2 and Gradium are English-only on our board: on Thai input they return text in the wrong script (roughly 76-100% error), so we mark them "does not support" instead of publishing a misleading number.
How was Thai STT accuracy measured?
30 Thai read-speech clips from FLEURS, loudness-normalized to -16 LUFS, sent through the Speko gateway with the provider pinned, and scored as character error rate on 2026-06-03. Support is checked first: a provider is only benchmarked on a language it actually transcribes in the native script.