Build and maintain a rich voice catalog across TTS providers
Project description
voice-audition
The casting director for your AI voice agent. Search 697 voices across 9 TTS providers with semantic search, run use-case auditions, and compare API vs self-hosted costs.
Install
pip install voice-audition
pip install voice-audition[enrich] # adds Qwen2-Audio enrichment
pip install voice-audition[mcp] # adds MCP server for Claude
Setup
cp .env.example .env
# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...
# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...
Quick start
# Sync voices from all providers
voice-audition sync
# Build search index
voice-audition index
# Search
voice-audition search "warm female voice for healthcare"
# Run a full audition
voice-audition audition "fertility clinic for anxious IVF patients" --gender female
# Compare costs at 100k minutes/month
voice-audition costs 100000
Commands
| Command | What it does |
|---|---|
voice-audition sync [providers...] |
Sync voices from TTS providers with diff-based lifecycle tracking |
voice-audition index |
Build or rebuild the Moss semantic search index |
voice-audition search <query> |
Semantic search the voice catalog (--top-k for result count) |
voice-audition enrich [providers...] |
Enrich voices with Qwen2-Audio descriptions and traits (--model) |
voice-audition audition <brief> |
Run a use-case audition with ranked scorecard (--candidates, --gender, --provider) |
voice-audition costs <minutes> |
Compare API vs self-hosted costs at a given monthly volume |
voice-audition monitor |
Check provider reliability via status pages |
voice-audition stats |
Show catalog statistics |
voice-audition mcp |
Start the MCP server for Claude integration |
Cost calculator
$ voice-audition costs 100000
# Compares per-provider API pricing against self-hosted open source:
# ElevenLabs 100k min = $3,000/mo
# Cartesia 100k min = $1,500/mo
# Kokoro 100k min = $0 (self-hosted, GPU cost only)
# Piper 100k min = $0 (self-hosted, CPU-capable)
At high volume, self-hosted open source voices (Kokoro, Piper, Orpheus) can cut costs to near-zero -- the calculator shows the breakeven.
MCP Server
Add to Claude Desktop config:
{
"mcpServers": {
"voice-audition": {
"command": "voice-audition",
"args": ["mcp"],
"env": {
"MOSS_PROJECT_ID": "...",
"MOSS_PROJECT_KEY": "..."
}
}
}
}
Exposes 7 tools:
| Tool | What it does |
|---|---|
search_voices |
Semantic search across the full catalog |
get_voice |
Get detailed info for a specific voice |
filter_voices |
Filter by gender, provider, cost, latency |
get_providers |
List available TTS providers and status |
get_catalog_stats |
Voice counts, coverage, freshness |
run_voice_audition |
Run a full use-case audition with scorecard |
calculate_voice_costs |
Compare API vs self-hosted costs at volume |
How it works
voice-audition search "warm female for healthcare"
|
1. Query embedded (Moss, moss-minilm model)
|
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
Name-based vibes fill in for unenriched voices
|
3. Metadata filters applied (gender, provider, cost, latency)
|
4. Top-k results ranked by relevance score
Voice catalog
697 voices across 9 providers:
| Type | Providers |
|---|---|
| Commercial | ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT |
| Open source | Kokoro, Piper, Orpheus, Chatterbox, Fish Speech |
- Diff-based sync with lifecycle tracking (new/deprecated/changed detection)
- Weekly pricing change detection via page hash diff
- Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
- Synced every 6 hours via GitHub Actions
- Provider reliability monitoring via status pages
- Open source registry includes hosting platform data (GPU requirements, inference speed)
Enrichment pipeline
Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline classifies audio samples with Qwen2-Audio to fill in traits and descriptions. Tested on 10 voices.
pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio
Audition profiles
5 built-in use-case profiles with role-specific scoring criteria:
- Healthcare: patient comfort, trust, empathy, clarity, pacing, sensitivity
- Sales: energy, rapport, persuasiveness, confidence, resilience, likability
- Support: patience, clarity, helpfulness, professionalism, warmth, resolution focus
- Finance: authority, precision, trustworthiness, calm, professionalism, compliance
- Meditation: calm, spaciousness, grounding, non-intrusive, breath quality, presence
Self-hosting
Open source voices (Kokoro, Piper, Orpheus) run locally with no API costs. The catalog tracks GPU requirements and inference speed for each. Use voice-audition costs <minutes> to see when self-hosting beats API pricing for your volume.
Development
git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_audition-0.4.1.tar.gz.
File metadata
- Download URL: voice_audition-0.4.1.tar.gz
- Upload date:
- Size: 68.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
942c1e7850111ff921922575df8ca021d147097b16c6d92b0b2373504552dc70
|
|
| MD5 |
a0760711f0e56a0978cb5ec136caca29
|
|
| BLAKE2b-256 |
3c4e290f23b34123e8a5b2c9f1daed5c248fae3b3f809268421ae606b58dbf20
|
Provenance
The following attestation bundles were made for voice_audition-0.4.1.tar.gz:
Publisher:
publish.yml on mnvsk97/voice-audition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_audition-0.4.1.tar.gz -
Subject digest:
942c1e7850111ff921922575df8ca021d147097b16c6d92b0b2373504552dc70 - Sigstore transparency entry: 1237961895
- Sigstore integration time:
-
Permalink:
mnvsk97/voice-audition@66265a2fd577daba269b6935552d2a4b130c3653 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/mnvsk97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@66265a2fd577daba269b6935552d2a4b130c3653 -
Trigger Event:
push
-
Statement type:
File details
Details for the file voice_audition-0.4.1-py3-none-any.whl.
File metadata
- Download URL: voice_audition-0.4.1-py3-none-any.whl
- Upload date:
- Size: 35.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
076201227cd37dddc7c5afedeacee9bba7f1d7a2e2ff8aa389a2de60d9f44e2d
|
|
| MD5 |
be3c2db8e6200369ae69b1bd5c1fc711
|
|
| BLAKE2b-256 |
8dba3fbe88bdad336a48c950eaafce9ed6bc4aa9c7ef1e93c78947830795e701
|
Provenance
The following attestation bundles were made for voice_audition-0.4.1-py3-none-any.whl:
Publisher:
publish.yml on mnvsk97/voice-audition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_audition-0.4.1-py3-none-any.whl -
Subject digest:
076201227cd37dddc7c5afedeacee9bba7f1d7a2e2ff8aa389a2de60d9f44e2d - Sigstore transparency entry: 1237961901
- Sigstore integration time:
-
Permalink:
mnvsk97/voice-audition@66265a2fd577daba269b6935552d2a4b130c3653 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/mnvsk97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@66265a2fd577daba269b6935552d2a4b130c3653 -
Trigger Event:
push
-
Statement type: