Question 1

What is Speko?

Accepted Answer

Speko is a voice AI benchmarking and optimization platform. It connects to 18+ voice AI providers and automatically tests 240+ STT, LLM, and TTS combinations against your specific language, use case, and cost constraints — returning ranked results in minutes.

Question 2

Which voice AI providers does Speko support?

Accepted Answer

Speko supports 18+ providers including Deepgram, AssemblyAI, ElevenLabs, Cartesia, PlayHT, OpenAI, Gemini, Groq, Cerebras, Vapi, Retell, Bland AI, Hume AI, and more. New providers are added regularly.

Question 3

How does Speko benchmark voice AI providers?

Accepted Answer

Speko runs STT, LLM, and TTS providers in combination against your specific inputs, measuring latency, accuracy, cost, and quality. Every benchmark number is cited with source URLs and verification dates. See our methodology at speko.ai/blog/methodology.

Question 4

Which STT provider is most accurate for English?

Accepted Answer

Based on our March 2026 benchmarks, Deepgram Nova-3 and AssemblyAI Universal-3 Pro lead for English accuracy. Deepgram Nova-3 achieves 4.1% WER on clean audio; AssemblyAI Universal-3 Pro averages 5.9% WER across 26 diverse datasets. The best choice depends on your audio conditions and latency requirements.

Question 5

What is the cheapest voice AI stack in 2026?

Accepted Answer

The lowest-cost production-ready stack is approximately $0.0095/minute, combining Deepgram Nova-3 ($0.0043/min) + Gemini 2.0 Flash ($0.0007/min) + Cartesia Sonic ($0.0045/min). See our full cost breakdown at speko.ai/blog/voice-ai-cost-2026.

Question 6

How is Speko different from Vapi or Retell?

Accepted Answer

Vapi and Retell are voice agent platforms that lock you into their provider choices. Speko is provider-agnostic infrastructure that benchmarks all providers against your requirements and helps you choose and switch freely. Speko integrates with any platform including Vapi, Retell, and custom stacks.

Provider	WER	Noisy WER	Latency	Cost/min	Languages	Source
Deepgram Nova-3	7.8%	10.4%	2.4s	$0.0043	36	Deepgram
ElevenLabs Scribe v2 Realtime	3.5%	5.1%	1.0s	$0.0067	99	ElevenLabs API
ElevenLabs Scribe v1	5.4%	6.9%	1.1s	$0.0067	99	ElevenLabs API
Alibaba qwen3-asr-flash	3.5%	5.4%	0.6s	$0.0021	90	Alibaba DashScope Qwen3-ASR-Flash
OpenAI gpt-4o-transcribe	19.4%	33.7%	1.9s	$0.0060	50	OpenAI API pricing
OpenAI gpt-4o-mini-transcribe	8.2%	17.1%	2.0s	$0.0030	50	OpenAI API pricing
OpenAI whisper-1	11.6%	12.3%	2.6s	$0.0060	57	OpenAI API pricing
xAI Grok STT	16.75%	16.1%	0.9s	$0.0017	25	xAI Grok STT and TTS APIs announcement
AssemblyAI Universal-2	6.22%	9.5%	3.7s	$0.0062	99	AssemblyAI
AssemblyAI Universal-3 Pro	5.06%	7.9%	4.2s	$0.0067	6	AssemblyAI
Google Cloud Chirp 2	5.37%	17.9%	4.5s	$0.0240	125	Google Cloud Speech-to-Text v2
Google Gemini 2.5 Flash (STT)	6.03%	18.9%	2.5s	$0.0002	100	Gemini API

Speech-to-Text Benchmark

How accuracy holds up under pressure.

Stop guessing. Start benchmarking.