OmniVoice multilingual zero-shot TTS toolkit for Strands Agents — voice cloning, voice design, and 600+ language synthesis as agent tools

These details have not been verified by PyPI

Project links

Project description

strands-omnivoice

Multilingual zero-shot TTS toolkit for Strands Agents — 600+ languages, voice cloning, and voice design as agent tools.

Wraps k2-fsa/OmniVoice — a state-of-the-art diffusion-language-model TTS that supports 600+ languages with RTF as low as 0.025 — as a clean set of @tool functions that any Strands Agent can call.

✨ Features

600+ languages — broadest zero-shot TTS coverage available
Voice cloning — clone any speaker from 3–10s of reference audio
Voice design — describe the speaker via attributes (female, british accent, whisper)
Auto voice — let the model pick a voice
Built-in ASR — transcribe reference audio with the bundled Whisper model
Batch synthesis — generate many WAVs in one call, sharing a loaded model
Inline tags — [laughter], [sigh], pinyin (ZHE2), CMU phonemes ([B EY1 S])
Apple Silicon + CUDA + CPU — auto-device with STRANDS_OMNIVOICE_DEVICE override
Singleton loader — every tool shares one cached checkpoint, no reloads

📦 Install

pip install strands-omnivoice

That installs strands-omnivoice plus its omnivoice>=0.1.5 runtime. Pick a PyTorch flavour matching your hardware:

# NVIDIA CUDA (Linux/Windows)
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128

# Apple Silicon (MPS)
pip install torch==2.8.0 torchaudio==2.8.0

Developer setup

git clone https://github.com/cagataycali/strands-omnivoice && cd strands-omnivoice
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest -q

🚀 Quick Start

from strands import Agent
from strands_omnivoice import (
    omnivoice_tts, omnivoice_clone, omnivoice_design,
    omnivoice_sysinfo, audio_play,
)

agent = Agent(tools=[
    omnivoice_tts, omnivoice_clone, omnivoice_design,
    omnivoice_sysinfo, audio_play,
])

# Auto voice
agent("Synthesize 'Hello world' to /tmp/hello.wav and play it.")

# Voice cloning
agent("Clone the speaker in /tmp/ref.wav and say 'Bonjour le monde' to /tmp/fr.wav.")

# Voice design
agent("Make a british female elderly whisper saying 'Once upon a time' to /tmp/story.wav.")

🧰 Tools

Tool	Purpose
`omnivoice_tts`	Auto-voice synthesis — text → WAV
`omnivoice_clone`	Voice cloning from a 3–10 s reference clip
`omnivoice_design`	Voice design via attributes (gender, age, pitch, accent, dialect)
`omnivoice_batch`	Multi-item synthesis sharing a single loaded model
`omnivoice_transcribe`	ASR via OmniVoice's bundled Whisper model
`omnivoice_load_model`	Pre-warm / reload the model
`omnivoice_unload_model`	Drop cached weights and free GPU memory
`omnivoice_download_model`	Snapshot-download the checkpoint without loading
`omnivoice_sysinfo`	Device, dtype, OmniVoice version, loaded-state diagnostics
`omnivoice_list_languages`	Browse the 600+ supported languages
`audio_probe`	Inspect any audio file (duration / SR / channels / format)
`audio_play`	Play a WAV via host's default player (afplay/aplay/paplay/ffplay)
`omnivoice_demo_serve`	Launch the upstream Gradio web UI as a background process

All tools return the standard Strands tool result shape — they compose freely inside Agent(tools=[...]).

🎛️ Voice Design — Attribute Reference

instruct= accepts a comma-separated list of attributes. Categories below are mutually exclusive within each row; combine across rows freely.

Category	Values
Gender	`male`, `female`
Age	`child`, `teenager`, `young adult`, `middle-aged`, `elderly`
Pitch	`very low pitch`, `low pitch`, `moderate pitch`, `high pitch`, `very high pitch`
Style	`whisper`
English accent (EN text only)	`american`, `british`, `australian`, `canadian`, `indian`, `chinese`, `korean`, `portuguese`, `russian`, `japanese` accent
Chinese dialect (ZH text only)	`四川话`, `陕西话`, `东北话`, `云南话`, `河南话`, ...

Examples:

"female, young adult, high pitch, british accent"
"male, elderly, low pitch, whisper"
"女, 青年, 四川话"

See the upstream voice-design docs for the full table.

🔊 Inline Tags & Pronunciation Control

agent("""omnivoice_tts text="[laughter] You really got me." output="/tmp/laugh.wav" """)

# Chinese — pinyin pronunciation override
agent("""omnivoice_tts text="这批货物打ZHE2出售。" output="/tmp/pinyin.wav" """)

# English — CMU phoneme override
agent("""omnivoice_tts text="He plays the [B EY1 S] guitar." output="/tmp/cmu.wav" """)

Supported tags: [laughter], [sigh], [confirmation-en], [question-en], [question-ah/oh/ei/yi], [surprise-ah/oh/wa/yo], [dissatisfaction-hnn].

⚙️ Configuration

Environment variables override defaults:

Var	Default	Description
`STRANDS_OMNIVOICE_MODEL`	`k2-fsa/OmniVoice`	HF repo or local checkpoint path
`STRANDS_OMNIVOICE_DEVICE`	auto (cuda → mps → cpu)	Force device
`STRANDS_OMNIVOICE_DTYPE`	auto	`float16`, `float32`, `bfloat16`

Or pass per-call via model_id= / device= arguments to any tool.

🧪 Testing the Agent

python agent.py "Show sysinfo, then synthesize 'Привет мир' to /tmp/ru.wav and play it."

Without args, agent.py lists every registered tool.

🏗️ Architecture

strands_omnivoice/
├── __init__.py           # exports: 13 tools + loader API
├── _common.py            # ToolResult builders (ok/err) + path helpers
├── _loader.py            # singleton OmniVoice loader (thread-safe)
└── tools/
    ├── tts.py            # auto-voice synthesis
    ├── clone.py          # voice cloning
    ├── design.py         # voice design (attributes)
    ├── batch.py          # multi-item generation
    ├── transcribe.py     # ASR
    ├── model_lifecycle.py  # load / unload / download
    ├── info.py           # sysinfo + list_languages
    ├── audio_utils.py    # probe + play
    └── demo_server.py    # Gradio UI launcher

The loader caches one model per (model_id, device) key — every tool gets the same instance, so a workflow that calls omnivoice_clone then omnivoice_design only loads weights once.

🤝 Acknowledgments

k2-fsa/OmniVoice — the upstream model. Massive credit to Han Zhu and the k2-fsa team.
Strands Agents — the agent framework.
strands-cosmos — sister project that inspired this scaffold.

📄 License

Apache 2.0 — same as upstream OmniVoice. See LICENSE.

Disclaimer: as with the upstream model, you are strictly prohibited from using this for unauthorized voice cloning, impersonation, fraud, or any illegal/unethical activity. Use responsibly.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_omnivoice-0.1.0.tar.gz (27.0 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

strands_omnivoice-0.1.0-py3-none-any.whl (27.4 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file strands_omnivoice-0.1.0.tar.gz.

File metadata

Download URL: strands_omnivoice-0.1.0.tar.gz
Upload date: May 16, 2026
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for strands_omnivoice-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`da0c835c840e3011ba1081a328380debb2e508531c8fbd267550c8e36879b6e0`
MD5	`31f6e5a19bc6e9393d479002ef9879ea`
BLAKE2b-256	`eb7896470ea47ef2a07bd963c43ef5f7f86cf1f353b6cc499f50e0fea8f2698a`

See more details on using hashes here.

File details

Details for the file strands_omnivoice-0.1.0-py3-none-any.whl.

File metadata

Download URL: strands_omnivoice-0.1.0-py3-none-any.whl
Upload date: May 16, 2026
Size: 27.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for strands_omnivoice-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b8cfe9a2eed74f8bcad6f3b744f795826466330b078d3ed13dfc49773727667`
MD5	`23abac66111610cb1c6c051bfcae7a91`
BLAKE2b-256	`2f7ffb87d1d6c444cad8d3e4e2d0f8b0ed17f91701d0aa6294efd840ca4e5503`

See more details on using hashes here.

strands-omnivoice 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

strands-omnivoice

✨ Features

📦 Install

Developer setup

🚀 Quick Start

🧰 Tools

🎛️ Voice Design — Attribute Reference

🔊 Inline Tags & Pronunciation Control

⚙️ Configuration

🧪 Testing the Agent

🏗️ Architecture

🤝 Acknowledgments

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes