Speko + Deepgram
Deepgram Nova-3 sets the bar for real-time STT accuracy and speed. Speko benchmarks it against every major competitor so you know exactly when it's your best option — and when it isn't.
Last updated: March 2026
What Deepgram Does
Deepgram is a speech AI platform built from the ground up for real-time applications. Its Nova-3 model leads the industry on clean-audio WER, and its Aura-2 TTS rounds out a full speech pipeline within a single vendor.
Leading Real-Time STT
Deepgram Nova-3 achieves 4.1% WER on clean English audio — lower than Google, AssemblyAI, and OpenAI Whisper. It's purpose-built for streaming, with a model architecture that prioritizes low-latency transcription over batch accuracy.
Sub-300ms Streaming Latency
Nova-3's streaming endpoint delivers first-word transcriptions in under 300ms under normal conditions. For voice agents where every millisecond of perceived latency matters, this is a meaningful advantage over higher-latency alternatives.
Full Speech Pipeline
Beyond STT, Deepgram also offers Aura-2 TTS, enabling teams to run both sides of a voice pipeline through a single API. This simplifies integration, reduces vendor relationships, and can streamline latency in cascaded architectures.
How Speko Works with Deepgram
Speko connects to the Deepgram API to run standardized accuracy, latency, and cost benchmarks — then places those results next to AssemblyAI, Whisper, and Google so you can make an evidence-based provider decision.
Nova-3 vs the Field
Run Speko to benchmark Deepgram Nova-3 against AssemblyAI Universal-3, OpenAI Whisper Large v3, and Google Chirp 2. Same audio corpus, same evaluation methodology — no vendor-favorable test conditions.
WER and Latency Side by Side
Speko measures word error rate and streaming latency for Deepgram under real-world conditions including background noise, accented speakers, and telephony audio. See where Nova-3's 4.1% WER advantage holds up — and where it narrows.
When Deepgram Is the Top Pick
Speko's benchmark outputs include a provider recommendation layer. When Deepgram Nova-3 is the best fit for your audio profile and cost budget, Speko tells you. When AssemblyAI or another provider wins, Speko tells you that too.
Deepgram Features Benchmarked by Speko
- Nova-3 word error rate: 4.1% on clean English audio (industry-leading)
- $0.0043/min Nova-3 pay-as-you-go pricing (tracked as rates change)
- Streaming transcription latency under 300ms at the 95th percentile
- 30+ language support with per-language accuracy breakdowns
- Speaker diarization accuracy on multi-speaker audio benchmarks
Frequently Asked Questions
Is Deepgram the best STT provider for real-time voice agents?
Deepgram Nova-3 holds the best published word error rate on clean English audio at 4.1% and achieves streaming latency under 300ms — making it a top choice for real-time voice agents. However, AssemblyAI Universal-3 Pro outperforms Nova-3 on diverse and noisy audio conditions across 26 evaluation datasets. Speko runs both side by side so you can see which provider wins for your actual audio inputs rather than relying on cherry-picked vendor benchmarks.
What is the difference between Deepgram Nova-3 and Nova-2?
Nova-3 is Deepgram's latest generation, offering a significant WER improvement over Nova-2 — from approximately 6.2% to 4.1% on clean English audio. Nova-3 also adds improved support for accented speech, better handling of domain-specific vocabulary, and enhanced speaker diarization. For most new production deployments, Nova-3 is the right choice unless your cost constraints require Nova-2 pricing.
How accurate is Deepgram for accented or non-native English speech?
Deepgram Nova-3 performs well on standard American and British English accents. Performance on heavy accents or non-native speakers varies and can be meaningfully worse than the headline 4.1% WER figure. AssemblyAI Universal-3 Pro tends to score better on diverse accent conditions. Speko's benchmark suite includes accent-diverse test sets so you can quantify the gap for your specific user base before committing to a provider.
How much does Deepgram cost per hour of audio?
Deepgram Nova-3 is priced at $0.0043 per minute on the pay-as-you-go plan, which translates to $0.258 per hour. Nova-2 is slightly less expensive. Volume discounts are available for higher usage tiers. Speko tracks current pricing across all major STT providers and factors it into cost-per-hour comparisons so your financial model stays accurate as vendors update their rates.
See How Deepgram Performs in Your Stack
Don't rely on vendor marketing for your STT decision. Run a Speko benchmark to see exactly how Deepgram Nova-3 compares on accuracy, latency, and cost for your audio conditions.