Build and maintain a rich voice catalog across TTS providers

These details have not been verified by PyPI

Project description

voice-audition

The casting director for your AI voice agent. Search 697 voices across 9 TTS providers with semantic search, run use-case auditions, and compare API vs self-hosted costs.

Install

pip install voice-audition
pip install voice-audition[enrich]   # adds Qwen2-Audio enrichment
pip install voice-audition[mcp]      # adds MCP server for Claude

Setup

cp .env.example .env

# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...

# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...

Quick start

# Sync voices from all providers
voice-audition sync

# Build search index
voice-audition index

# Search
voice-audition search "warm female voice for healthcare"

# Run a full audition
voice-audition audition "fertility clinic for anxious IVF patients" --gender female

# Compare costs at 100k minutes/month
voice-audition costs 100000

Commands

Command	What it does
`voice-audition sync [providers...]`	Sync voices from TTS providers with diff-based lifecycle tracking
`voice-audition index`	Build or rebuild the Moss semantic search index
`voice-audition search <query>`	Semantic search the voice catalog (`--top-k` for result count)
`voice-audition enrich [providers...]`	Enrich voices with Qwen2-Audio descriptions and traits (`--model`)
`voice-audition audition <brief>`	Run a use-case audition with ranked scorecard (`--candidates`, `--gender`, `--provider`)
`voice-audition costs <minutes>`	Compare API vs self-hosted costs at a given monthly volume
`voice-audition monitor`	Check provider reliability via status pages
`voice-audition stats`	Show catalog statistics
`voice-audition mcp`	Start the MCP server for Claude integration

Cost calculator

$ voice-audition costs 100000

# Compares per-provider API pricing against self-hosted open source:
#   ElevenLabs  100k min = $3,000/mo
#   Cartesia    100k min = $1,500/mo
#   Kokoro      100k min = $0 (self-hosted, GPU cost only)
#   Piper       100k min = $0 (self-hosted, CPU-capable)

At high volume, self-hosted open source voices (Kokoro, Piper, Orpheus) can cut costs to near-zero -- the calculator shows the breakeven.

MCP Server

Add to Claude Desktop config:

{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"],
      "env": {
        "MOSS_PROJECT_ID": "...",
        "MOSS_PROJECT_KEY": "..."
      }
    }
  }
}

Exposes 7 tools:

Tool	What it does
`search_voices`	Semantic search across the full catalog
`get_voice`	Get detailed info for a specific voice
`filter_voices`	Filter by gender, provider, cost, latency
`get_providers`	List available TTS providers and status
`get_catalog_stats`	Voice counts, coverage, freshness
`run_voice_audition`	Run a full use-case audition with scorecard
`calculate_voice_costs`	Compare API vs self-hosted costs at volume

How it works

voice-audition search "warm female for healthcare"
    |
1. Query embedded (Moss, moss-minilm model)
    |
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
   Name-based vibes fill in for unenriched voices
    |
3. Metadata filters applied (gender, provider, cost, latency)
    |
4. Top-k results ranked by relevance score

Voice catalog

697 voices across 9 providers:

Type	Providers
Commercial	ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT
Open source	Kokoro, Piper, Orpheus, Chatterbox, Fish Speech

Diff-based sync with lifecycle tracking (new/deprecated/changed detection)
Weekly pricing change detection via page hash diff
Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
Synced every 6 hours via GitHub Actions
Provider reliability monitoring via status pages
Open source registry includes hosting platform data (GPU requirements, inference speed)

Enrichment pipeline

Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline classifies audio samples with Qwen2-Audio to fill in traits and descriptions. Tested on 10 voices.

pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio

Audition profiles

5 built-in use-case profiles with role-specific scoring criteria:

Healthcare: patient comfort, trust, empathy, clarity, pacing, sensitivity
Sales: energy, rapport, persuasiveness, confidence, resilience, likability
Support: patience, clarity, helpfulness, professionalism, warmth, resolution focus
Finance: authority, precision, trustworthiness, calm, professionalism, compliance
Meditation: calm, spaciousness, grounding, non-intrusive, breath quality, presence

Self-hosting

Open source voices (Kokoro, Piper, Orpheus) run locally with no API costs. The catalog tracks GPU requirements and inference speed for each. Use voice-audition costs <minutes> to see when self-hosting beats API pricing for your volume.

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.1

Apr 5, 2026

0.4.0

Apr 4, 2026

0.3.0

Apr 4, 2026

0.2.0

Apr 4, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_audition-0.4.1.tar.gz (68.1 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_audition-0.4.1-py3-none-any.whl (35.2 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file voice_audition-0.4.1.tar.gz.

File metadata

Download URL: voice_audition-0.4.1.tar.gz
Upload date: Apr 5, 2026
Size: 68.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`942c1e7850111ff921922575df8ca021d147097b16c6d92b0b2373504552dc70`
MD5	`a0760711f0e56a0978cb5ec136caca29`
BLAKE2b-256	`3c4e290f23b34123e8a5b2c9f1daed5c248fae3b3f809268421ae606b58dbf20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.4.1.tar.gz:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voice_audition-0.4.1.tar.gz
- Subject digest: 942c1e7850111ff921922575df8ca021d147097b16c6d92b0b2373504552dc70
- Sigstore transparency entry: 1237961895
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: mnvsk97/voice-audition@66265a2fd577daba269b6935552d2a4b130c3653
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/mnvsk97
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@66265a2fd577daba269b6935552d2a4b130c3653
- Trigger Event: push

File details

Details for the file voice_audition-0.4.1-py3-none-any.whl.

File metadata

Download URL: voice_audition-0.4.1-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`076201227cd37dddc7c5afedeacee9bba7f1d7a2e2ff8aa389a2de60d9f44e2d`
MD5	`be3c2db8e6200369ae69b1bd5c1fc711`
BLAKE2b-256	`8dba3fbe88bdad336a48c950eaafce9ed6bc4aa9c7ef1e93c78947830795e701`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.4.1-py3-none-any.whl:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voice_audition-0.4.1-py3-none-any.whl
- Subject digest: 076201227cd37dddc7c5afedeacee9bba7f1d7a2e2ff8aa389a2de60d9f44e2d
- Sigstore transparency entry: 1237961901
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: mnvsk97/voice-audition@66265a2fd577daba269b6935552d2a4b130c3653
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/mnvsk97
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@66265a2fd577daba269b6935552d2a4b130c3653
- Trigger Event: push

voice-audition 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

voice-audition

Install

Setup

Quick start

Commands

Cost calculator

MCP Server

How it works

Voice catalog

Enrichment pipeline

Audition profiles

Self-hosting

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance