Skip to main content

⚡ Benchmark LLM, STT, TTS, and full voice pipeline latency across every major AI provider

Project description

modelping

⚡ Benchmark LLM, STT, TTS, and full voice pipeline latency across every major AI provider.

One tool. Every provider. The metrics that actually matter.


Python 3.10+ MIT License PyPI GitHub Stars


What It Measures

Category Metrics
LLM Time to First Token (TTFT) P50/P95/P99, tokens/sec, cost
STT Transcription latency, time to first partial transcript
TTS Time to First Audio Byte (TTFB), realtime factor
Pipeline Full STT→LLM→TTS end-to-end latency (the hero metric)

All measurements use real streaming requests — latency is captured at the byte level.


Installation

pip install modelping

Or install from source:

git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .

Quick Start

# Copy and configure API keys
cp .env.example .env
# Edit .env with your keys

# Run the full voice pipeline benchmark (hero feature)
modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2

# Benchmark LLMs head-to-head
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash

# Benchmark STT providers
modelping stt

# Benchmark TTS providers
modelping tts

Sample Output

Pipeline (STT→LLM→TTS)

$ modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2

╭──────────────────────────────────────────────────────────────────────────────╮
│  modelping pipeline  •  3 runs                                               │
╰──────────────────────────────────────────────────────────────────────────────╯

 STT                     LLM                      TTS              STT    LLM    TTS   Total
 ─────────────────────────────────────────────────────────────────────────────────────────────
 groq/whisper-large-v3   groq/llama-3.3-70b        cartesia/sonic-2  182ms  44ms   91ms   317ms

✓ Pipeline tested  •  fastest total: 317ms

LLM

$ modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash llama-3.3-70b-versatile

╭─────────────────────────────────────────────────────────────────────────────────╮
│  modelping  •  5 runs  •  prompt: 64 tokens                                     │
╰─────────────────────────────────────────────────────────────────────────────────╯

 Model                          Provider     TTFT P50   TTFT P95   Tok/s   Cost/1M
 ─────────────────────────────────────────────────────────────────────────────────
 llama-3.3-70b-versatile        groq           42ms       67ms    312.4     $0.79
 gemini-2.0-flash               google         89ms      134ms    143.2     $0.40
 claude-3-5-sonnet-20241022     anthropic     198ms      234ms     71.1    $15.00
 gpt-4o                         openai        312ms      489ms     82.3    $10.00

✓ 4 models tested  •  12.3s total

Colors: 🟢 green = fastest, 🟡 yellow = mid, 🔴 red = slowest (relative to the tested set).

STT

$ modelping stt --runs 3

 Model                    Provider     Latency P50   Latency P95   Words
 ────────────────────────────────────────────────────────────────────────
 whisper-large-v3-turbo   groq            180ms         210ms        9
 nova-2                   deepgram        240ms         290ms        9
 whisper-1                openai          890ms        1100ms        9

✓ 3 providers tested

TTS

$ modelping tts --runs 3

 Model                    Provider      TTFB P50   TTFB P95   Realtime
 ──────────────────────────────────────────────────────────────────────
 sonic-2                  cartesia         89ms      112ms      14.2x
 eleven_flash_v2_5        elevenlabs      210ms      267ms       8.1x
 aura-asteria-en          deepgram        198ms      245ms       9.3x
 tts-1                    openai          312ms      398ms       6.4x

✓ 4 providers tested

Full CLI Reference

# LLM benchmarks
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.0-flash
modelping run --all
modelping run --provider groq
modelping run gpt-4o --runs 10
modelping run gpt-4o --prompt "custom prompt"
modelping run gpt-4o --json
modelping run gpt-4o --csv
modelping run gpt-4o --fail-above-ttft 500

# STT benchmarks
modelping stt
modelping stt groq-whisper-large-v3 deepgram-nova-2
modelping stt --runs 5

# TTS benchmarks
modelping tts
modelping tts cartesia-sonic-2 elevenlabs-flash
modelping tts --runs 5
modelping tts --text "Custom text to synthesize"

# Pipeline benchmark (full STT→LLM→TTS)
modelping pipeline
modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2
modelping pipeline --stt all --llm all --tts all
modelping pipeline --runs 3

# List available models
modelping models
modelping models --provider anthropic

Supported Providers

LLM

Model Provider Input $/1M Output $/1M
gpt-4o openai $2.50 $10.00
gpt-4o-mini openai $0.15 $0.60
o3-mini openai $1.10 $4.40
claude-3-5-sonnet-20241022 anthropic $3.00 $15.00
claude-3-haiku-20240307 anthropic $0.25 $1.25
gemini-2.0-flash google $0.10 $0.40
gemini-1.5-pro google $1.25 $5.00
llama-3.3-70b-versatile groq $0.59 $0.79
mixtral-8x7b-32768 groq $0.24 $0.24
accounts/fireworks/models/llama-v3p1-70b-instruct fireworks $0.90 $0.90
meta-llama/Llama-3.3-70B-Instruct-Turbo together $0.88 $0.88
mistral-large-latest mistral $2.00 $6.00
mistral-small-latest mistral $0.10 $0.30
command-r-plus cohere $2.50 $10.00
command-r cohere $0.15 $0.60

STT

Model Provider
whisper-large-v3 groq
whisper-large-v3-turbo groq
distil-whisper-large-v3-en groq
whisper-1 openai
gpt-4o-transcribe openai
nova-2 deepgram
nova-3 deepgram
best assemblyai
nano assemblyai
(default) gladia

TTS

Model Provider
eleven_flash_v2_5 elevenlabs
eleven_multilingual_v2 elevenlabs
sonic-2 cartesia
sonic-english cartesia
tts-1 openai
tts-1-hd openai
(streaming) fish-audio
PlayDialog playht
Play3.0-mini playht
aura-asteria-en deepgram
aura-luna-en deepgram
blizzard lmnt
aurora lmnt

Configuration

Set API keys in a .env file in your working directory:

# LLM providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...
FIREWORKS_API_KEY=fw_...
TOGETHER_API_KEY=...
MISTRAL_API_KEY=...
COHERE_API_KEY=...

# STT providers
DEEPGRAM_API_KEY=...
ASSEMBLYAI_API_KEY=...
GLADIA_API_KEY=...

# TTS providers
ELEVENLABS_API_KEY=...
CARTESIA_API_KEY=...
FISH_AUDIO_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...
LMNT_API_KEY=...

modelping auto-detects configured providers and skips (with a warning) any without keys set.


CI/CD Example

GitHub Actions

name: AI Latency Check

on:
  schedule:
    - cron: '0 */6 * * *'   # every 6 hours
  workflow_dispatch:

jobs:
  latency-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install modelping
        run: pip install modelping

      - name: Run LLM latency benchmark
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
        run: |
          modelping run gpt-4o claude-3-5-sonnet-20241022 --runs 3 --fail-above-ttft 1000

      - name: Check voice pipeline latency
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          CARTESIA_API_KEY: ${{ secrets.CARTESIA_API_KEY }}
        run: modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2 --fail-above-ttft 500

      - name: Export JSON results
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          modelping run --provider openai --json > results.json

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: latency-results
          path: results.json

Roadmap

  • LLM benchmarking (TTFT, throughput, cost)
  • STT benchmarking (transcription latency)
  • TTS benchmarking (time to first audio byte)
  • Full STT→LLM→TTS pipeline benchmark
  • Community leaderboard — submit anonymous results, see global rankings
  • Web UI — run benchmarks from your browser (bring your own keys)
  • Self-hosted / open source model endpoints
  • Historical tracking and latency alerts

Contributing

See CONTRIBUTING.md for instructions on adding a new provider.

PRs welcome for:

  • New providers
  • New models / updated pricing
  • Output improvements
  • Bug fixes
git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelping-0.1.0.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelping-0.1.0-py3-none-any.whl (55.4 kB view details)

Uploaded Python 3

File details

Details for the file modelping-0.1.0.tar.gz.

File metadata

  • Download URL: modelping-0.1.0.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for modelping-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22a1f412896892168c91962ed876078e56ade0287b71afe68d852dec7573878c
MD5 c67d36189b568116c2ba4d4f7017b08e
BLAKE2b-256 5e2d05010f6b23487ef795f04b3d0dcb028f711ea0ff8a976828c0bf35e24891

See more details on using hashes here.

File details

Details for the file modelping-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: modelping-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 55.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for modelping-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e8579c2fcb55c78ff68b76e15d8c8382ef905abf6312679f9fc96df69589a26
MD5 99de1fae4e655f2886cc11bcb59e1c53
BLAKE2b-256 86fa043111fbb504ee7902c93168224fb2235aa3dfeab89a5f63672461a85c6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page