Skip to main content

Build and maintain a rich voice catalog across TTS providers

Project description

voice-audition

The casting director for your AI voice agent. Search 670+ voices across 6 TTS providers with semantic search.

Install

pip install voice-audition

Setup

# Copy env template and fill in keys
cp .env.example .env
# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...

# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...

Quick start

# Sync voices from all providers
voice-audition sync

# Build search index
voice-audition index

# Search
voice-audition search "warm female voice for healthcare"
voice-audition search "authoritative male British accent"

# Check catalog
voice-audition stats

# Monitor provider reliability
voice-audition monitor

Commands

Command What it does
voice-audition sync [providers...] Sync voices from TTS providers (all if none specified)
voice-audition index Build or rebuild the Moss semantic search index
voice-audition search <query> Semantic search the voice catalog (--top-k for result count)
voice-audition enrich [providers...] Enrich unenriched voices with descriptions and traits (--model to pick classifier)
voice-audition monitor Check provider reliability via status pages
voice-audition stats Show catalog statistics
voice-audition mcp Start the MCP server for Claude integration

MCP Server

Add to Claude Desktop config:

{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"],
      "env": {
        "MOSS_PROJECT_ID": "...",
        "MOSS_PROJECT_KEY": "..."
      }
    }
  }
}

Exposes 5 tools:

Tool What it does
search_voices Semantic search across the full catalog
get_voice Get detailed info for a specific voice
filter_voices Filter by gender, provider, cost, latency
get_providers List available TTS providers and status
get_catalog_stats Voice counts, coverage, freshness

How it works

voice-audition search "warm female for healthcare"
    |
1. Query embedded (Moss, moss-minilm model)
    |
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
    |
3. Metadata filters applied (gender, provider, cost, latency)
    |
4. Top-k results ranked by relevance score

Voice catalog

  • 670+ voices across 6 providers: ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT
  • Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
  • Synced every 6 hours via GitHub Actions
  • Provider reliability monitoring via status pages

Enrichment pipeline

Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline generates audio samples and classifies them with a local model (Qwen2-Audio via mlx-audio) to fill in traits and descriptions.

pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio

The classifier is currently stubbed and needs implementation.

Research

See the research/ directory. Key findings:

  • Voice perception collapses to 2 axes: warmth vs authority (McAleer et al.)
  • Speech rate is the #1 trust predictor, not pitch
  • Emotional voices: +50% CSAT but -20% accuracy (Deepgram)

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_audition-0.3.0.tar.gz (104.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_audition-0.3.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file voice_audition-0.3.0.tar.gz.

File metadata

  • Download URL: voice_audition-0.3.0.tar.gz
  • Upload date:
  • Size: 104.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ae46c2e7b322afd49069d35c1e430b25af100f28ca31a3795f87515ef6b28762
MD5 ce6e5cbc38469095d432d5c7c0b9a374
BLAKE2b-256 ddd9e4fd74d66fa1cd88ef30c57d4015651b1679c649be4871fa5ed30aaa7ab3

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.3.0.tar.gz:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voice_audition-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: voice_audition-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cb1771e360134e06048fba0c17e9da018eeea98feb40775f17b9c04c2218f60
MD5 5763f9b83084002d388361944aae6b15
BLAKE2b-256 06a485a4ee1f5a0c002023e4672c86a27d8eba773a34935f27906582240d51b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.3.0-py3-none-any.whl:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page