Best Thai Text-to-Speech API (2026): independent benchmark
Thai is the hardest test in our TTS wedge: five lexical tones carry word meaning, and Thai rhythm is syllable-timed where English is stress-timed. We measured 11 systems on Speko's Thai eval set: an intelligibility gate first, then rhythm (%V), tone fidelity, and an anglicization index.
10 of 11 systems produce intelligible Thai; Deepgram Aura 2 fails the gate by falling back to English output. Of the 10 that pass, 9 clear the 0.78 %V nativeness floor and Cartesia Sonic 3.5 falls below it. GPT Realtime posted both the highest %V (0.8168) and the lowest anglicization index (0.5693).
Thai TTS measurements
Sorted by %V (rhythm nativeness, floor 0.78), gate failures last. Diagnostic profile: %V is the one validated quality floor; tone fidelity and anglicization are diagnostics. Anglicization: lower is better.
| System | Type | Gate | %V (floor 0.78) | Tone fidelity | Anglicization |
|---|---|---|---|---|---|
| GPT Realtime | realtime | pass | 0.8168 | 0.5919 | 0.5693 |
| xAI / Grok TTS | tts | pass | 0.8124 | 0.5727 | 0.654 |
| ElevenLabs v3 | tts | pass | 0.8077 | 0.4975 | 0.7366 |
| MiniMax Speech 2.6 HD | tts | pass | 0.801 | 0.454 | 0.9126 |
| Inworld TTS 2 | tts | pass | 0.8007 | 0.5844 | 0.8841 |
| GPT Realtime v2 | realtime | pass | 0.7965 | 0.5536 | 0.8499 |
| Hume Octave 2 | tts | pass | 0.7947 | 0.4871 | 0.8468 |
| Qwen3 TTS Flash | tts | pass | 0.7916 | 0.5375 | 0.9165 |
| GPT-4o mini TTS | tts | pass | 0.7823 | 0.586 | 0.8949 |
| Cartesia Sonic 3.5 | tts | pass | 0.7166 (below floor) | 0.5105 | 0.8683 |
| Deepgram Aura 2 | tts | fail (English fallback) | 0.7608 | 0.5624 | 0.9583 |
How we measured
- Eval set: Speko's Thai TTS eval set v1 (fixed Thai prompts synthesized by every system).
- Gate: an intelligibility check runs first; systems that come back as English (not Thai) are excluded from the Thai panel rather than scored.
- Rhythm: %V, the vocalic proportion of total speech time. Thai is syllable-timed (high %V); English-timed output reads as non-native. Floor: 0.78, validated against native ratings (correlation +0.70, n=11).
- Tone fidelity: pitch-contour correlation across the five Thai lexical tones (mid, low, falling, high, rising) against native references.
- Anglicization index: a universal phoneme recognizer checks how often Thai-only phones get replaced by English-inventory neighbours. 0 is native phonology, 1 is English phonology with Thai tokens. Lower is better.
- Data is synced from the published run at benchmarks.speko.ai (snapshot 2026-06-05).
Full interactive panels, audio clips, and the complete methodology: benchmarks.speko.ai
Use the best Thai voice without lock-in
Speko is one API in front of every system on this page: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. When the next run reshuffles this table, your integration does not change.
curl -X POST https://api.speko.dev/v1/synthesize \
-H "Authorization: Bearer $SPEKO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "สวัสดีครับ ยินดีต้อนรับ", "intent": {"language": "th"}}' \
--output reply.audio import { Speko } from '@spekoai/sdk';
const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });
const { audio, provider, model } = await speko.synthesize('สวัสดีครับ ยินดีต้อนรับ', {
language: 'th',
}); FAQ
What is the best Thai text-to-speech API?
There is no single validated Thai TTS quality score, so we publish a gated diagnostic instead: 10 of 11 measured systems produce intelligible Thai, and 9 of those clear the 0.78 %V rhythm floor (the one metric validated against native-speaker ratings, correlation +0.70, n=11). On the current run GPT Realtime has the highest %V (0.8168) and GPT Realtime the lowest anglicization index (0.5693).
Does Deepgram Aura 2 support Thai?
It failed our Thai intelligibility gate: on Thai input it falls back to English output. Per our language-support rule it is excluded from Thai rankings rather than ranked with a misleading number.
Does ElevenLabs support Thai text-to-speech?
Yes. ElevenLabs v3 passes the Thai intelligibility gate with %V 0.8077 (above the 0.78 floor), tone fidelity 0.4975, and anglicization 0.7366.
Why is there no single Thai TTS quality ranking?
Most acoustic metrics have no measured link to what Thai listeners actually rate as native. The exception is %V (the vocalic proportion of speech, a rhythm metric): it correlates +0.70 with native ratings (n=11), so we publish it as a floor (0.78) rather than pretend a composite score exists.