Skip to main content

Build and maintain a rich voice catalog across TTS providers

Project description

voice-audition

The casting director for your AI voice agent. Search 670+ voices across 6 TTS providers with semantic search.

Install

pip install voice-audition

Setup

# Copy env template and fill in keys
cp .env.example .env
# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...

# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...

Quick start

# Sync voices from all providers
voice-audition sync

# Build search index
voice-audition index

# Search
voice-audition search "warm female voice for healthcare"
voice-audition search "authoritative male British accent"

# Check catalog
voice-audition stats

# Monitor provider reliability
voice-audition monitor

Commands

Command What it does
voice-audition sync [providers...] Sync voices from TTS providers (all if none specified)
voice-audition index Build or rebuild the Moss semantic search index
voice-audition search <query> Semantic search the voice catalog (--top-k for result count)
voice-audition enrich [providers...] Enrich unenriched voices with descriptions and traits (--model to pick classifier)
voice-audition monitor Check provider reliability via status pages
voice-audition stats Show catalog statistics
voice-audition mcp Start the MCP server for Claude integration

MCP Server

Add to Claude Desktop config:

{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"],
      "env": {
        "MOSS_PROJECT_ID": "...",
        "MOSS_PROJECT_KEY": "..."
      }
    }
  }
}

Exposes 5 tools:

Tool What it does
search_voices Semantic search across the full catalog
get_voice Get detailed info for a specific voice
filter_voices Filter by gender, provider, cost, latency
get_providers List available TTS providers and status
get_catalog_stats Voice counts, coverage, freshness

How it works

voice-audition search "warm female for healthcare"
    |
1. Query embedded (Moss, moss-minilm model)
    |
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
    |
3. Metadata filters applied (gender, provider, cost, latency)
    |
4. Top-k results ranked by relevance score

Voice catalog

  • 670+ voices across 6 providers: ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT
  • Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
  • Synced every 6 hours via GitHub Actions
  • Provider reliability monitoring via status pages

Enrichment pipeline

Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline generates audio samples and classifies them with a local model (Qwen2-Audio via mlx-audio) to fill in traits and descriptions.

pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio

The classifier is currently stubbed and needs implementation.

Research

See the research/ directory. Key findings:

  • Voice perception collapses to 2 axes: warmth vs authority (McAleer et al.)
  • Speech rate is the #1 trust predictor, not pitch
  • Emotional voices: +50% CSAT but -20% accuracy (Deepgram)

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_audition-0.2.0.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_audition-0.2.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file voice_audition-0.2.0.tar.gz.

File metadata

  • Download URL: voice_audition-0.2.0.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f150bf61279b6fc90651d6def5de9482a930bf10dc01a56c5aefe41e3553f361
MD5 504dee84499b2659b15a4da60827d729
BLAKE2b-256 de51a419f0fdc2f89ffa1c667cf175ea96f36e0333c829be8a39c00dc4afeb60

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.2.0.tar.gz:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voice_audition-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: voice_audition-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 089ef4509b8739f79be01da0523f54f058c7ea2303c5ef011b7a6eff8c22fb59
MD5 fd1dee81e4d0d4ca012b4978ab621af4
BLAKE2b-256 fc9e3fe1ffc47dbf1aad690b396339cbb3e92764349052938901eb10ff23f283

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.2.0-py3-none-any.whl:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page