Best Thai Text-to-Speech API (2026): independent benchmark

Thai is the hardest test in our TTS wedge: five lexical tones carry word meaning, and Thai rhythm is syllable-timed where English is stress-timed. We measured 11 systems on Speko's Thai eval set: an intelligibility gate first, then rhythm (%V), tone fidelity, and an anglicization index.

10 of 11 systems produce intelligible Thai; Deepgram Aura 2 fails the gate by falling back to English output. Of the 10 that pass, 9 clear the 0.78 %V nativeness floor and Cartesia Sonic 3.5 falls below it. GPT Realtime posted both the highest %V (0.8168) and the lowest anglicization index (0.5693).

This is a diagnostic profile, not a single-score leaderboard. %V is the one axis with a measured correlation to native judgment (rho = +0.70 vs native rating (n=11)); tone fidelity and anglicization are published as diagnostics.

Thai TTS measurements

Sorted by %V (rhythm nativeness, floor 0.78), gate failures last. Diagnostic profile: %V is the one validated quality floor; tone fidelity and anglicization are diagnostics. Anglicization: lower is better.

System Type Gate %V (floor 0.78) Tone fidelity Anglicization
GPT Realtime realtime pass 0.8168 0.5919 0.5693
xAI / Grok TTS tts pass 0.8124 0.5727 0.654
ElevenLabs v3 tts pass 0.8077 0.4975 0.7366
MiniMax Speech 2.6 HD tts pass 0.801 0.454 0.9126
Inworld TTS 2 tts pass 0.8007 0.5844 0.8841
GPT Realtime v2 realtime pass 0.7965 0.5536 0.8499
Hume Octave 2 tts pass 0.7947 0.4871 0.8468
Qwen3 TTS Flash tts pass 0.7916 0.5375 0.9165
GPT-4o mini TTS tts pass 0.7823 0.586 0.8949
Cartesia Sonic 3.5 tts pass 0.7166 (below floor) 0.5105 0.8683
Deepgram Aura 2 tts fail (English fallback) 0.7608 0.5624 0.9583

How we measured

Full interactive panels, audio clips, and the complete methodology: benchmarks.speko.ai

Use the best Thai voice without lock-in

Speko is one API in front of every system on this page: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. When the next run reshuffles this table, your integration does not change.

curl
curl -X POST https://api.speko.dev/v1/synthesize \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "สวัสดีครับ ยินดีต้อนรับ", "intent": {"language": "th"}}' \
  --output reply.audio
TypeScript
import { Speko } from '@spekoai/sdk';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });

const { audio, provider, model } = await speko.synthesize('สวัสดีครับ ยินดีต้อนรับ', {
  language: 'th',
});
start free read the docs

FAQ

What is the best Thai text-to-speech API?

There is no single validated Thai TTS quality score, so we publish a gated diagnostic instead: 10 of 11 measured systems produce intelligible Thai, and 9 of those clear the 0.78 %V rhythm floor (the one metric validated against native-speaker ratings, correlation +0.70, n=11). On the current run GPT Realtime has the highest %V (0.8168) and GPT Realtime the lowest anglicization index (0.5693).

Does Deepgram Aura 2 support Thai?

It failed our Thai intelligibility gate: on Thai input it falls back to English output. Per our language-support rule it is excluded from Thai rankings rather than ranked with a misleading number.

Does ElevenLabs support Thai text-to-speech?

Yes. ElevenLabs v3 passes the Thai intelligibility gate with %V 0.8077 (above the 0.78 floor), tone fidelity 0.4975, and anglicization 0.7366.

Why is there no single Thai TTS quality ranking?

Most acoustic metrics have no measured link to what Thai listeners actually rate as native. The exception is %V (the vocalic proportion of speech, a rhythm metric): it correlates +0.70 with native ratings (n=11), so we publish it as a floor (0.78) rather than pretend a composite score exists.

More language benchmarks

Best Thai STT APIBest Filipino TTS APIBest Vietnamese TTS APISTT benchmarks by languageFull interactive TTS benchmark