Build and maintain a rich voice catalog across TTS providers
Project description
voice-audition
The casting director for your AI voice agent. Search 670+ voices across 6 TTS providers with semantic search.
Install
pip install voice-audition
Setup
# Copy env template and fill in keys
cp .env.example .env
# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...
# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...
Quick start
# Sync voices from all providers
voice-audition sync
# Build search index
voice-audition index
# Search
voice-audition search "warm female voice for healthcare"
voice-audition search "authoritative male British accent"
# Check catalog
voice-audition stats
# Monitor provider reliability
voice-audition monitor
Commands
| Command | What it does |
|---|---|
voice-audition sync [providers...] |
Sync voices from TTS providers (all if none specified) |
voice-audition index |
Build or rebuild the Moss semantic search index |
voice-audition search <query> |
Semantic search the voice catalog (--top-k for result count) |
voice-audition enrich [providers...] |
Enrich unenriched voices with descriptions and traits (--model to pick classifier) |
voice-audition monitor |
Check provider reliability via status pages |
voice-audition stats |
Show catalog statistics |
voice-audition mcp |
Start the MCP server for Claude integration |
MCP Server
Add to Claude Desktop config:
{
"mcpServers": {
"voice-audition": {
"command": "voice-audition",
"args": ["mcp"],
"env": {
"MOSS_PROJECT_ID": "...",
"MOSS_PROJECT_KEY": "..."
}
}
}
}
Exposes 5 tools:
| Tool | What it does |
|---|---|
search_voices |
Semantic search across the full catalog |
get_voice |
Get detailed info for a specific voice |
filter_voices |
Filter by gender, provider, cost, latency |
get_providers |
List available TTS providers and status |
get_catalog_stats |
Voice counts, coverage, freshness |
How it works
voice-audition search "warm female for healthcare"
|
1. Query embedded (Moss, moss-minilm model)
|
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
|
3. Metadata filters applied (gender, provider, cost, latency)
|
4. Top-k results ranked by relevance score
Voice catalog
- 670+ voices across 6 providers: ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT
- Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
- Synced every 6 hours via GitHub Actions
- Provider reliability monitoring via status pages
Enrichment pipeline
Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline generates audio samples and classifies them with a local model (Qwen2-Audio via mlx-audio) to fill in traits and descriptions.
pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio
The classifier is currently stubbed and needs implementation.
Research
See the research/ directory. Key findings:
- Voice perception collapses to 2 axes: warmth vs authority (McAleer et al.)
- Speech rate is the #1 trust predictor, not pitch
- Emotional voices: +50% CSAT but -20% accuracy (Deepgram)
Development
git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_audition-0.3.0.tar.gz.
File metadata
- Download URL: voice_audition-0.3.0.tar.gz
- Upload date:
- Size: 104.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae46c2e7b322afd49069d35c1e430b25af100f28ca31a3795f87515ef6b28762
|
|
| MD5 |
ce6e5cbc38469095d432d5c7c0b9a374
|
|
| BLAKE2b-256 |
ddd9e4fd74d66fa1cd88ef30c57d4015651b1679c649be4871fa5ed30aaa7ab3
|
Provenance
The following attestation bundles were made for voice_audition-0.3.0.tar.gz:
Publisher:
publish.yml on mnvsk97/voice-audition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_audition-0.3.0.tar.gz -
Subject digest:
ae46c2e7b322afd49069d35c1e430b25af100f28ca31a3795f87515ef6b28762 - Sigstore transparency entry: 1231100827
- Sigstore integration time:
-
Permalink:
mnvsk97/voice-audition@25cf8031ea8c337dd13fe69069e8759f12188b7b -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/mnvsk97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25cf8031ea8c337dd13fe69069e8759f12188b7b -
Trigger Event:
push
-
Statement type:
File details
Details for the file voice_audition-0.3.0-py3-none-any.whl.
File metadata
- Download URL: voice_audition-0.3.0-py3-none-any.whl
- Upload date:
- Size: 37.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cb1771e360134e06048fba0c17e9da018eeea98feb40775f17b9c04c2218f60
|
|
| MD5 |
5763f9b83084002d388361944aae6b15
|
|
| BLAKE2b-256 |
06a485a4ee1f5a0c002023e4672c86a27d8eba773a34935f27906582240d51b2
|
Provenance
The following attestation bundles were made for voice_audition-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on mnvsk97/voice-audition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_audition-0.3.0-py3-none-any.whl -
Subject digest:
8cb1771e360134e06048fba0c17e9da018eeea98feb40775f17b9c04c2218f60 - Sigstore transparency entry: 1231100863
- Sigstore integration time:
-
Permalink:
mnvsk97/voice-audition@25cf8031ea8c337dd13fe69069e8759f12188b7b -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/mnvsk97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25cf8031ea8c337dd13fe69069e8759f12188b7b -
Trigger Event:
push
-
Statement type: