⚡ Benchmark LLM, STT, TTS, and full voice pipeline latency across every major AI provider
Project description
modelping
⚡ Benchmark LLM, STT, TTS, and full voice pipeline latency across every major AI provider.
One tool. Every provider. The metrics that actually matter.
What It Measures
| Category | Metrics |
|---|---|
| LLM | Time to First Token (TTFT) P50/P95/P99, tokens/sec, cost |
| STT | Transcription latency, time to first partial transcript |
| TTS | Time to First Audio Byte (TTFB), realtime factor |
| Pipeline | Full STT→LLM→TTS end-to-end latency (the hero metric) |
All measurements use real streaming requests — latency is captured at the byte level.
Installation
pip install modelping
Or install from source:
git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .
Quick Start
# Copy and configure API keys
cp .env.example .env
# Edit .env with your keys
# Run the full voice pipeline benchmark (hero feature)
modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2
# Benchmark LLMs head-to-head
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash
# Benchmark STT providers
modelping stt
# Benchmark TTS providers
modelping tts
Sample Output
Pipeline (STT→LLM→TTS)
$ modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2
╭──────────────────────────────────────────────────────────────────────────────╮
│ modelping pipeline • 3 runs │
╰──────────────────────────────────────────────────────────────────────────────╯
STT LLM TTS STT LLM TTS Total
─────────────────────────────────────────────────────────────────────────────────────────────
groq/whisper-large-v3 groq/llama-3.3-70b cartesia/sonic-2 182ms 44ms 91ms 317ms
✓ Pipeline tested • fastest total: 317ms
LLM
$ modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash llama-3.3-70b-versatile
╭─────────────────────────────────────────────────────────────────────────────────╮
│ modelping • 5 runs • prompt: 64 tokens │
╰─────────────────────────────────────────────────────────────────────────────────╯
Model Provider TTFT P50 TTFT P95 Tok/s Cost/1M
─────────────────────────────────────────────────────────────────────────────────
llama-3.3-70b-versatile groq 42ms 67ms 312.4 $0.79
gemini-2.0-flash google 89ms 134ms 143.2 $0.40
claude-3-5-sonnet-20241022 anthropic 198ms 234ms 71.1 $15.00
gpt-4o openai 312ms 489ms 82.3 $10.00
✓ 4 models tested • 12.3s total
Colors: 🟢 green = fastest, 🟡 yellow = mid, 🔴 red = slowest (relative to the tested set).
STT
$ modelping stt --runs 3
Model Provider Latency P50 Latency P95 Words
────────────────────────────────────────────────────────────────────────
whisper-large-v3-turbo groq 180ms 210ms 9
nova-2 deepgram 240ms 290ms 9
whisper-1 openai 890ms 1100ms 9
✓ 3 providers tested
TTS
$ modelping tts --runs 3
Model Provider TTFB P50 TTFB P95 Realtime
──────────────────────────────────────────────────────────────────────
sonic-2 cartesia 89ms 112ms 14.2x
eleven_flash_v2_5 elevenlabs 210ms 267ms 8.1x
aura-asteria-en deepgram 198ms 245ms 9.3x
tts-1 openai 312ms 398ms 6.4x
✓ 4 providers tested
Full CLI Reference
# LLM benchmarks
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash
modelping run --all
modelping run --provider groq
modelping run gpt-4o --runs 10
modelping run gpt-4o --prompt "custom prompt"
modelping run gpt-4o --json
modelping run gpt-4o --csv
modelping run gpt-4o --fail-above-ttft 500
# STT benchmarks
modelping stt
modelping stt groq-whisper-large-v3 deepgram-nova-2
modelping stt --runs 5
# TTS benchmarks
modelping tts
modelping tts cartesia-sonic-2 elevenlabs-flash
modelping tts --runs 5
modelping tts --text "Custom text to synthesize"
# Pipeline benchmark (full STT→LLM→TTS)
modelping pipeline
modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2
modelping pipeline --stt all --llm all --tts all
modelping pipeline --runs 3
# List available models
modelping models
modelping models --provider anthropic
Supported Providers
LLM
| Model | Provider | Input $/1M | Output $/1M |
|---|---|---|---|
| gpt-4o | openai | $2.50 | $10.00 |
| gpt-4o-mini | openai | $0.15 | $0.60 |
| o3-mini | openai | $1.10 | $4.40 |
| claude-3-5-sonnet-20241022 | anthropic | $3.00 | $15.00 |
| claude-3-haiku-20240307 | anthropic | $0.25 | $1.25 |
| gemini-2.0-flash | $0.10 | $0.40 | |
| gemini-1.5-pro | $1.25 | $5.00 | |
| llama-3.3-70b-versatile | groq | $0.59 | $0.79 |
| mixtral-8x7b-32768 | groq | $0.24 | $0.24 |
| accounts/fireworks/models/llama-v3p1-70b-instruct | fireworks | $0.90 | $0.90 |
| meta-llama/Llama-3.3-70B-Instruct-Turbo | together | $0.88 | $0.88 |
| mistral-large-latest | mistral | $2.00 | $6.00 |
| mistral-small-latest | mistral | $0.10 | $0.30 |
| command-r-plus | cohere | $2.50 | $10.00 |
| command-r | cohere | $0.15 | $0.60 |
STT
| Model | Provider |
|---|---|
| whisper-large-v3 | groq |
| whisper-large-v3-turbo | groq |
| distil-whisper-large-v3-en | groq |
| whisper-1 | openai |
| gpt-4o-transcribe | openai |
| nova-2 | deepgram |
| nova-3 | deepgram |
| best | assemblyai |
| nano | assemblyai |
| (default) | gladia |
TTS
| Model | Provider |
|---|---|
| eleven_flash_v2_5 | elevenlabs |
| eleven_multilingual_v2 | elevenlabs |
| sonic-2 | cartesia |
| sonic-english | cartesia |
| tts-1 | openai |
| tts-1-hd | openai |
| (streaming) | fish-audio |
| PlayDialog | playht |
| Play3.0-mini | playht |
| aura-asteria-en | deepgram |
| aura-luna-en | deepgram |
| blizzard | lmnt |
| aurora | lmnt |
Configuration
Set API keys in a .env file in your working directory:
# LLM providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...
FIREWORKS_API_KEY=fw_...
TOGETHER_API_KEY=...
MISTRAL_API_KEY=...
COHERE_API_KEY=...
# STT providers
DEEPGRAM_API_KEY=...
ASSEMBLYAI_API_KEY=...
GLADIA_API_KEY=...
# TTS providers
ELEVENLABS_API_KEY=...
CARTESIA_API_KEY=...
FISH_AUDIO_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...
LMNT_API_KEY=...
modelping auto-detects configured providers and skips (with a warning) any without keys set.
CI/CD Example
GitHub Actions
name: AI Latency Check
on:
schedule:
- cron: '0 */6 * * *' # every 6 hours
workflow_dispatch:
jobs:
latency-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install modelping
run: pip install modelping
- name: Run LLM latency benchmark
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
run: |
modelping run gpt-4o claude-3-5-sonnet-20241022 --runs 3 --fail-above-ttft 1000
- name: Check voice pipeline latency
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
CARTESIA_API_KEY: ${{ secrets.CARTESIA_API_KEY }}
run: modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2 --fail-above-ttft 500
- name: Export JSON results
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
modelping run --provider openai --json > results.json
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: latency-results
path: results.json
Roadmap
- LLM benchmarking (TTFT, throughput, cost)
- STT benchmarking (transcription latency)
- TTS benchmarking (time to first audio byte)
- Full STT→LLM→TTS pipeline benchmark
- Community leaderboard — submit anonymous results, see global rankings
- Web UI — run benchmarks from your browser (bring your own keys)
- Self-hosted / open source model endpoints
- Historical tracking and latency alerts
Contributing
See CONTRIBUTING.md for instructions on adding a new provider.
PRs welcome for:
- New providers
- New models / updated pricing
- Output improvements
- Bug fixes
git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelping-0.1.0.tar.gz.
File metadata
- Download URL: modelping-0.1.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22a1f412896892168c91962ed876078e56ade0287b71afe68d852dec7573878c
|
|
| MD5 |
c67d36189b568116c2ba4d4f7017b08e
|
|
| BLAKE2b-256 |
5e2d05010f6b23487ef795f04b3d0dcb028f711ea0ff8a976828c0bf35e24891
|
File details
Details for the file modelping-0.1.0-py3-none-any.whl.
File metadata
- Download URL: modelping-0.1.0-py3-none-any.whl
- Upload date:
- Size: 55.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e8579c2fcb55c78ff68b76e15d8c8382ef905abf6312679f9fc96df69589a26
|
|
| MD5 |
99de1fae4e655f2886cc11bcb59e1c53
|
|
| BLAKE2b-256 |
86fa043111fbb504ee7902c93168224fb2235aa3dfeab89a5f63672461a85c6f
|