Build and maintain a rich voice catalog across TTS providers

These details have not been verified by PyPI

Project description

voice-audition

The casting director for your AI voice agent. Search 670+ voices across 6 TTS providers with semantic search.

Install

pip install voice-audition

Setup

# Copy env template and fill in keys
cp .env.example .env

# Required: Moss credentials for semantic search
# Get them at https://platform.inferedge.dev
MOSS_PROJECT_ID=...
MOSS_PROJECT_KEY=...

# Optional: Provider API keys for sync + enrichment
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
RIME_API_KEY=...
CARTESIA_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...

Quick start

# Sync voices from all providers
voice-audition sync

# Build search index
voice-audition index

# Search
voice-audition search "warm female voice for healthcare"
voice-audition search "authoritative male British accent"

# Check catalog
voice-audition stats

# Monitor provider reliability
voice-audition monitor

Commands

Command	What it does
`voice-audition sync [providers...]`	Sync voices from TTS providers (all if none specified)
`voice-audition index`	Build or rebuild the Moss semantic search index
`voice-audition search <query>`	Semantic search the voice catalog (`--top-k` for result count)
`voice-audition enrich [providers...]`	Enrich unenriched voices with descriptions and traits (`--model` to pick classifier)
`voice-audition monitor`	Check provider reliability via status pages
`voice-audition stats`	Show catalog statistics
`voice-audition mcp`	Start the MCP server for Claude integration

MCP Server

Add to Claude Desktop config:

{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"],
      "env": {
        "MOSS_PROJECT_ID": "...",
        "MOSS_PROJECT_KEY": "..."
      }
    }
  }
}

Exposes 5 tools:

Tool	What it does
`search_voices`	Semantic search across the full catalog
`get_voice`	Get detailed info for a specific voice
`filter_voices`	Filter by gender, provider, cost, latency
`get_providers`	List available TTS providers and status
`get_catalog_stats`	Voice counts, coverage, freshness

How it works

voice-audition search "warm female for healthcare"
    |
1. Query embedded (Moss, moss-minilm model)
    |
2. Hybrid search: semantic similarity + keyword matching (alpha=0.7)
    |
3. Metadata filters applied (gender, provider, cost, latency)
    |
4. Top-k results ranked by relevance score

Voice catalog

670+ voices across 6 providers: ElevenLabs, OpenAI, Deepgram, Rime, Cartesia, PlayHT
Research-backed schema: 8 perceptual traits, texture, pitch, emotional range
Synced every 6 hours via GitHub Actions
Provider reliability monitoring via status pages

Enrichment pipeline

Most providers ship voice metadata. Rime does not -- its 610 voices have no descriptions. The enrichment pipeline generates audio samples and classifies them with a local model (Qwen2-Audio via mlx-audio) to fill in traits and descriptions.

pip install voice-audition[enrich]
voice-audition enrich rime --model qwen2-audio

The classifier is currently stubbed and needs implementation.

Research

See the research/ directory. Key findings:

Voice perception collapses to 2 axes: warmth vs authority (McAleer et al.)
Speech rate is the #1 trust predictor, not pitch
Emotional voices: +50% CSAT but -20% accuracy (Deepgram)

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp]"

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

Apr 5, 2026

0.4.0

Apr 4, 2026

0.3.0

Apr 4, 2026

This version

0.2.0

Apr 4, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_audition-0.2.0.tar.gz (92.5 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_audition-0.2.0-py3-none-any.whl (23.5 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file voice_audition-0.2.0.tar.gz.

File metadata

Download URL: voice_audition-0.2.0.tar.gz
Upload date: Apr 4, 2026
Size: 92.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f150bf61279b6fc90651d6def5de9482a930bf10dc01a56c5aefe41e3553f361`
MD5	`504dee84499b2659b15a4da60827d729`
BLAKE2b-256	`de51a419f0fdc2f89ffa1c667cf175ea96f36e0333c829be8a39c00dc4afeb60`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.2.0.tar.gz:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voice_audition-0.2.0.tar.gz
- Subject digest: f150bf61279b6fc90651d6def5de9482a930bf10dc01a56c5aefe41e3553f361
- Sigstore transparency entry: 1230012026
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: mnvsk97/voice-audition@b1382979014b53c5888fed50b668e93b4ff0ed04
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/mnvsk97
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b1382979014b53c5888fed50b668e93b4ff0ed04
- Trigger Event: push

File details

Details for the file voice_audition-0.2.0-py3-none-any.whl.

File metadata

Download URL: voice_audition-0.2.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 23.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_audition-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`089ef4509b8739f79be01da0523f54f058c7ea2303c5ef011b7a6eff8c22fb59`
MD5	`fd1dee81e4d0d4ca012b4978ab621af4`
BLAKE2b-256	`fc9e3fe1ffc47dbf1aad690b396339cbb3e92764349052938901eb10ff23f283`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_audition-0.2.0-py3-none-any.whl:

Publisher: publish.yml on mnvsk97/voice-audition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voice_audition-0.2.0-py3-none-any.whl
- Subject digest: 089ef4509b8739f79be01da0523f54f058c7ea2303c5ef011b7a6eff8c22fb59
- Sigstore transparency entry: 1230012079
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: mnvsk97/voice-audition@b1382979014b53c5888fed50b668e93b4ff0ed04
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/mnvsk97
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b1382979014b53c5888fed50b668e93b4ff0ed04
- Trigger Event: push

voice-audition 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

voice-audition

Install

Setup

Quick start

Commands

MCP Server

How it works

Voice catalog

Enrichment pipeline

Research

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance