Best Thai Text-to-Speech API (2026): independent benchmark

Thai is the hardest test in our TTS wedge: five lexical tones carry word meaning, and Thai rhythm is syllable-timed where English is stress-timed. We measured 11 systems on Speko's Thai eval set: an intelligibility gate first, then rhythm (%V), tone fidelity, and an anglicization index.

10 of 11 systems produce intelligible Thai; Deepgram Aura 2 fails the gate by falling back to English output. Of the 10 that pass, 9 clear the 0.78 %V nativeness floor and Cartesia Sonic 3.5 falls below it. GPT Realtime posted both the highest %V (0.8168) and the lowest anglicization index (0.5693).

This is a diagnostic profile, not a single-score leaderboard. %V is the one axis with a measured correlation to native judgment (rho = +0.70 vs native rating (n=11)); tone fidelity and anglicization are published as diagnostics.

Thai TTS measurements

Sorted by %V (rhythm nativeness, floor 0.78), gate failures last. Diagnostic profile: %V is the one validated quality floor; tone fidelity and anglicization are diagnostics. Anglicization: lower is better.

System	Type	Gate	%V (floor 0.78)	Tone fidelity	Anglicization
GPT Realtime	realtime	pass	0.8168	0.5919	0.5693
xAI / Grok TTS	tts	pass	0.8124	0.5727	0.654
ElevenLabs v3	tts	pass	0.8077	0.4975	0.7366
MiniMax Speech 2.6 HD	tts	pass	0.801	0.454	0.9126
Inworld TTS 2	tts	pass	0.8007	0.5844	0.8841
GPT Realtime v2	realtime	pass	0.7965	0.5536	0.8499
Hume Octave 2	tts	pass	0.7947	0.4871	0.8468
Qwen3 TTS Flash	tts	pass	0.7916	0.5375	0.9165
GPT-4o mini TTS	tts	pass	0.7823	0.586	0.8949
Cartesia Sonic 3.5	tts	pass	0.7166 (below floor)	0.5105	0.8683
Deepgram Aura 2	tts	fail (English fallback)	0.7608	0.5624	0.9583

How we measured

Eval set: Speko's Thai TTS eval set v1 (fixed Thai prompts synthesized by every system).
Gate: an intelligibility check runs first; systems that come back as English (not Thai) are excluded from the Thai panel rather than scored.
Rhythm: %V, the vocalic proportion of total speech time. Thai is syllable-timed (high %V); English-timed output reads as non-native. Floor: 0.78, validated against native ratings (correlation +0.70, n=11).
Tone fidelity: pitch-contour correlation across the five Thai lexical tones (mid, low, falling, high, rising) against native references.
Anglicization index: a universal phoneme recognizer checks how often Thai-only phones get replaced by English-inventory neighbours. 0 is native phonology, 1 is English phonology with Thai tokens. Lower is better.
Data is synced from the published run at benchmarks.speko.ai (snapshot 2026-06-05).

Full interactive panels, audio clips, and the complete methodology: benchmarks.speko.ai

Use the best Thai voice without lock-in

Speko is one API in front of every system on this page: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. When the next run reshuffles this table, your integration does not change.

curl

curl -X POST https://api.speko.dev/v1/synthesize \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "สวัสดีครับ ยินดีต้อนรับ", "intent": {"language": "th"}}' \
  --output reply.audio

TypeScript

import { Speko } from '@spekoai/sdk';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });

const { audio, provider, model } = await speko.synthesize('สวัสดีครับ ยินดีต้อนรับ', {
  language: 'th',
});

start free read the docs

FAQ

What is the best Thai text-to-speech API?

There is no single validated Thai TTS quality score, so we publish a gated diagnostic instead: 10 of 11 measured systems produce intelligible Thai, and 9 of those clear the 0.78 %V rhythm floor (the one metric validated against native-speaker ratings, correlation +0.70, n=11). On the current run GPT Realtime has the highest %V (0.8168) and GPT Realtime the lowest anglicization index (0.5693).

Does Deepgram Aura 2 support Thai?

It failed our Thai intelligibility gate: on Thai input it falls back to English output. Per our language-support rule it is excluded from Thai rankings rather than ranked with a misleading number.

Does ElevenLabs support Thai text-to-speech?

Yes. ElevenLabs v3 passes the Thai intelligibility gate with %V 0.8077 (above the 0.78 floor), tone fidelity 0.4975, and anglicization 0.7366.

Why is there no single Thai TTS quality ranking?

Most acoustic metrics have no measured link to what Thai listeners actually rate as native. The exception is %V (the vocalic proportion of speech, a rhythm metric): it correlates +0.70 with native ratings (n=11), so we publish it as a floor (0.78) rather than pretend a composite score exists.

More language benchmarks

Best Thai STT API Best Filipino TTS API Best Vietnamese TTS API STT benchmarks by language Full interactive TTS benchmark