Best Vietnamese Speech-to-Text API (2026): independent benchmark

Vietnamese speech-to-text accuracy, measured. 30 Vietnamese read-speech clips from FLEURS, every provider routed through the same gateway and scored identically on 2026-06-03. Lower WER is better.

ElevenLabs Scribe v2 posted the lowest Vietnamese WER at 1.9%, ahead of Alibaba Qwen3-ASR at 2.4%. 2 of the 6 providers on our English board do not support Vietnamese at all, and the supported field spreads from 1.9% to 4.7% - picking a provider by its English score alone is a mistake.

Vietnamese STT leaderboard

30 Vietnamese clips (FLEURS), measured 2026-06-03 through the Speko gateway, loudness-normalized to -16 LUFS. WER measured on Vietnamese; latency and list price from the same gateway setup on the English board. Lower is better.

Provider / model	WER	p50 latency	List price
ElevenLabs Scribe v2	1.9%	1,353 ms	$0.0067/min
Alibaba Qwen3-ASR	2.4%	2,195 ms	-
OpenAI GPT-4o Transcribe	2.5%	1,084 ms	$0.006/min
xAI Grok STT	4.7%	996 ms	-
Cartesia Ink-2	does not support	-	-
Gradium	does not support	-	-

How we measured

Dataset: 30 Vietnamese read-speech clips from FLEURS (30 clips per language across the wedge).
Scoring: mean word error rate (WER), lower is better. Audio is loudness-normalized to -16 LUFS before scoring so input-gain handling does not contaminate the accuracy column.
Every provider is measured the same way: through the Speko gateway (POST /v1/transcribe, provider pinned), from a single location.
Latency and list price columns come from the same gateway setup measured on the English board (n=50); the WER column is measured on Vietnamese audio.
Run date: 2026-06-03.

Full interactive table, every territory, and the complete methodology: benchmarks.speko.ai

Use the winner without lock-in

The best Vietnamese provider today is one benchmark run away from being second best. Speko is one API in front of every provider on this table: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. No per-vendor integration, no migration when the leaderboard flips.

curl

curl -X POST https://api.speko.dev/v1/transcribe \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: audio/wav" \
  -H "x-speko-intent: {\"language\":\"vi\"}" \
  --data-binary @call.wav

TypeScript

import { Speko } from '@spekoai/sdk';
import { readFile } from 'node:fs/promises';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });
const audio = await readFile('./call.wav');

const { text, provider, confidence } = await speko.transcribe(audio, {
  language: 'vi',
});

start free read the docs

FAQ

What is the most accurate Vietnamese speech-to-text API?

On Speko's 2026-06-03 FLEURS benchmark (30 Vietnamese clips), ElevenLabs Scribe v2 posted the lowest WER at 1.9%, followed by Alibaba Qwen3-ASR at 2.4%.

Does ElevenLabs support Vietnamese speech-to-text?

Yes. ElevenLabs Scribe v2 scored 1.9% WER on our Vietnamese run, the best result on the board.

Which providers do not support Vietnamese transcription?

Cartesia Ink-2 and Gradium are English-only on our board: on Vietnamese input they return text in the wrong script (roughly 76-100% error), so we mark them "does not support" instead of publishing a misleading number.

How was Vietnamese STT accuracy measured?

30 Vietnamese read-speech clips from FLEURS, loudness-normalized to -16 LUFS, sent through the Speko gateway with the provider pinned, and scored as word error rate on 2026-06-03. Support is checked first: a provider is only benchmarked on a language it actually transcribes in the native script.

More language benchmarks

Best Vietnamese TTS API Best English STT API Best Thai STT API Best Indonesian STT API TTS benchmarks by language Full interactive STT benchmark