Best Filipino Text-to-Speech API (2026): independent benchmark

Filipino (Tagalog) breaks the usual TTS scoring playbook: it natively code-switches with English (Taglish), so English-sounding phones are expected content, not an accent tell. We measured 10 systems on Speko's Filipino eval set with the checks that stay objective: an intelligibility gate, round-trip CER, pacing, and signal hygiene.

8 of 10 systems produce intelligible Filipino. Polly Generative and Deepgram Aura 2 fail the gate: their "Filipino" output comes back detected as English. Among the systems that pass, xAI / Grok TTS posted the lowest round-trip CER at 1.5%.

No acoustic feature validly ranks Filipino quality: the rhythm metric that works for Thai inverts here, and English-phone intrusion correlates the wrong way because Taglish makes English phones legitimate content. Accent and naturalness are human-rated only.

Filipino TTS measurements

Sorted by round-trip CER, gate failures last. Objective checks only (intelligibility, pacing, hygiene): no acoustic metric validly ranks Filipino naturalness, so none is shown.

System	Type	Gate (detected)	Round-trip CER	Pacing (w/s)	True peak (dBTP)
xAI / Grok TTS	tts	pass (tl)	1.5%	2.51	-4.25
Cartesia Sonic 3.5	tts	pass (tl)	2.2%	2.91	-0.86 (hot)
ElevenLabs v3	tts	pass (tl)	2.4%	2.27	-0.37 (hot)
Inworld TTS 2	tts	pass (tl)	3.0%	2.78	-4.5
GPT Realtime	realtime	pass (tl)	3.5%	2.38	-3.51
GPT Realtime v2	realtime	pass (tl)	3.7%	2.21	-5.27
GPT-4o mini TTS	tts	pass (tl)	5.6%	2.06	-11.78
MiniMax Speech 2.6 HD	tts	pass (tl)	5.6%	2.02	-1.47
Polly Generative	tts	fail (detected English)	15.2%	2.08	-5.81
Deepgram Aura 2	tts	fail (detected English)	57.4%	1.09 (outside band)	-4.71

How we measured

Eval set: Speko's Filipino TTS eval set v1 (fixed Filipino prompts synthesized by every system).
Gate: language detection plus round-trip transcription. Output detected as English or above 50% CER fails (the model did not produce intelligible Filipino).
Round-trip CER: synthesized audio is transcribed back and compared to the prompt. Lower is better; this is an intelligibility check, not a naturalness ranking.
Pacing: speech rate in words/sec against a comfortable 2.0-3.2 band. A flag, not a rank.
Signal hygiene: clipping, true peak (danger line -1.0 dBTP), and DC-offset checks on the delivered audio.
Data is synced from the published run at benchmarks.speko.ai (snapshot 2026-06-05).

Full interactive panels, audio clips, and the complete methodology: benchmarks.speko.ai

Use the best Filipino voice without lock-in

Speko is one API in front of every system on this page: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. When the next run reshuffles this table, your integration does not change.

curl

curl -X POST https://api.speko.dev/v1/synthesize \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Kumusta! Salamat sa pagtawag.", "intent": {"language": "fil"}}' \
  --output reply.audio

TypeScript

import { Speko } from '@spekoai/sdk';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });

const { audio, provider, model } = await speko.synthesize('Kumusta! Salamat sa pagtawag.', {
  language: 'fil',
});

start free read the docs

FAQ

What is the best Filipino text-to-speech API?

No acoustic metric validly ranks Filipino naturalness (Taglish code-switching inverts the usual accent signals), so we publish objective checks instead. 8 of 10 systems pass the intelligibility gate, and xAI / Grok TTS has the lowest round-trip CER at 1.5%. Accent and naturalness judgments are left to native raters.

Does AWS Polly support Filipino?

Polly Generative failed our Filipino intelligibility gate: its output was detected as English with a 15.2% round-trip CER, so it is excluded rather than ranked.

Does ElevenLabs support Filipino text-to-speech?

Yes. ElevenLabs v3 passes the gate with a 2.4% round-trip CER. One flag: its master peaks at -0.37 dBTP, above our -1 dBTP danger line for downstream clipping.

Why is there no Filipino naturalness ranking?

The rhythm metric that works for Thai inverts on Filipino (correlation -0.53), and English-phone intrusion correlates the wrong way (+0.44) because Taglish makes English phones legitimate content. No deterministic feature separates the "conyo" accent native speakers penalize from legitimate loanwords, so naturalness stays human-rated.

More language benchmarks

Best Thai TTS API Best Vietnamese TTS API STT benchmarks by language Full interactive TTS benchmark