Best Indonesian Speech-to-Text API (2026): independent benchmark

Indonesian speech-to-text accuracy, measured. 30 Indonesian read-speech clips from FLEURS, every provider routed through the same gateway and scored identically on 2026-06-03. Lower WER is better.

OpenAI GPT-4o Transcribe posted the lowest Indonesian WER at 2.4%, ahead of xAI Grok STT at 2.9%. 2 of the 6 providers on our English board do not support Indonesian at all, and the supported field spreads from 2.4% to 4.6% - picking a provider by its English score alone is a mistake.

Indonesian STT leaderboard

30 Indonesian clips (FLEURS), measured 2026-06-03 through the Speko gateway, loudness-normalized to -16 LUFS. WER measured on Indonesian; latency and list price from the same gateway setup on the English board. Lower is better.

Provider / model	WER	p50 latency	List price
OpenAI GPT-4o Transcribe	2.4%	1,084 ms	$0.006/min
xAI Grok STT	2.9%	996 ms	-
ElevenLabs Scribe v2	3%	1,353 ms	$0.0067/min
Alibaba Qwen3-ASR	4.6%	2,195 ms	-
Cartesia Ink-2	does not support	-	-
Gradium	does not support	-	-

How we measured

Dataset: 30 Indonesian read-speech clips from FLEURS (30 clips per language across the wedge).
Scoring: mean word error rate (WER), lower is better. Audio is loudness-normalized to -16 LUFS before scoring so input-gain handling does not contaminate the accuracy column.
Every provider is measured the same way: through the Speko gateway (POST /v1/transcribe, provider pinned), from a single location.
Latency and list price columns come from the same gateway setup measured on the English board (n=50); the WER column is measured on Indonesian audio.
Run date: 2026-06-03.

Full interactive table, every territory, and the complete methodology: benchmarks.speko.ai

Use the winner without lock-in

The best Indonesian provider today is one benchmark run away from being second best. Speko is one API in front of every provider on this table: it routes each request to the measured-best provider for your language and fails over automatically when a provider degrades. No per-vendor integration, no migration when the leaderboard flips.

curl

curl -X POST https://api.speko.dev/v1/transcribe \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: audio/wav" \
  -H "x-speko-intent: {\"language\":\"id\"}" \
  --data-binary @call.wav

TypeScript

import { Speko } from '@spekoai/sdk';
import { readFile } from 'node:fs/promises';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });
const audio = await readFile('./call.wav');

const { text, provider, confidence } = await speko.transcribe(audio, {
  language: 'id',
});

start free read the docs

FAQ

What is the most accurate Indonesian speech-to-text API?

On Speko's 2026-06-03 FLEURS benchmark (30 Indonesian clips), OpenAI GPT-4o Transcribe posted the lowest WER at 2.4%, followed by xAI Grok STT at 2.9%.

Does ElevenLabs support Indonesian speech-to-text?

Yes. ElevenLabs Scribe v2 scored 3% WER on our Indonesian run.

Which providers do not support Indonesian transcription?

Cartesia Ink-2 and Gradium are English-only on our board: on Indonesian input they return text in the wrong script (roughly 76-100% error), so we mark them "does not support" instead of publishing a misleading number.

How was Indonesian STT accuracy measured?

30 Indonesian read-speech clips from FLEURS, loudness-normalized to -16 LUFS, sent through the Speko gateway with the provider pinned, and scored as word error rate on 2026-06-03. Support is checked first: a provider is only benchmarked on a language it actually transcribes in the native script.

More language benchmarks

Best English STT API Best Thai STT API Best Vietnamese STT API TTS benchmarks by language Full interactive STT benchmark