VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice adaptation, and editing
Project description
VoiceStudio
Your Complete Voice Adaptation Research Workspace
🎯 Overview
VoiceStudio is a unified toolkit for text-style prompted speech synthesis, enabling instant voice adaptation and editing through natural language descriptions. Built on cutting-edge research in voice style prompting, LoRA adaptation, and language-audio models.
Key Features:
- Text-Conditional Generation: Generate voice characteristics using natural language descriptions like "young female voice with warm tone"
- Multimodal Input: Support both text descriptions and audio feature vectors
- Voice Editing: Modify existing voices with simple instructions (Future Work)
- Instant Adaptation: Generate LoRA weights in a single forward pass without fine-tuning
- Architecture Agnostic: Works with multiple TTS architectures
- Zero-shot Generalization: Adapt to unseen voice characteristics not present in training data
- Parameter Efficiency: Minimal computational overhead compared to full model fine-tuning
🛠️ Installation
From PyPI (Recommended)
uv add voicestudio[all] # Install with all available base TTS models
From Source
git clone https://github.com/LatentForge/voicestudio.git
cd voicestudio
uv pip install -e ".[all]"
Development Installation
git clone https://github.com/LatentForge/voicestudio.git
cd voicestudio
uv pip install -e ".[all,web]"
Building and Publishing
# Build package
uv build
# Upload to PyPI
uv publish
📊 Supported Models
VoiceStudio works with various TTS architectures:
| Model | Status | Notes |
|---|---|---|
| Parler-TTS | ✅ Supported | Required further testing |
| Higgs-Audio | ✅ Supported | Required further testing |
| Qwen3-TTS | ✅ Supported | Required further testing |
| Chroma | ✅ Supported | Required further testing |
| Spark | 🔄 Experimental | Coming soon |
| Dia | ✅ Supported | Fully tested (by HF) |
| CozyVoice | 🔄 Experimental | Coming soon |
| F5-TTS | 🔄 Experimental | Coming soon |
Add your own model: See our Integration Guide
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas we need help with:
- 🔧 Additional TTS model adapters
- 📚 Documentation improvements
- 🐛 Bug fixes and testing
- 🌍 Multi-language support
- 🎨 New voice editing techniques
📝 License
This project is licensed under the MIT License - see LICENSE file for details.
The base TTS models supported by this project are subject to their own respective licenses. Users are responsible for reviewing and complying with each model’s license before use.
🙏 Acknowledgments
- Sakana AI for the original Text-to-LoRA concept
- HyperTTS authors for hypernetwork applications in TTS
- The open-source community for tools and datasets
- CLAP: Microsoft & LAION-AI for CLAP model
- LoRA: Microsoft for LoRA technique
- HuggingFace: For transformers library and model hub
📚 Citation
If you use VoiceStudio in your research, please cite:
@software{voicestudio2026,
title={VoiceStudio: A Unified Toolkit for Voice Style Adaptation},
author={Your Name},
year={2026},
url={https://github.com/LatentForge/voicestudio}
}
@article{t2a-lora-2025,
title={T2A-LoRA: Text-to-Audio LoRA Generation via Hypernetworks for Real-time Voice Adaptation},
author={LatentForge},
journal={arXiv preprint arXiv:2501.XXXXX},
year={2025}
}
🔗 Links
- Paper: arXiv:2501.XXXXX
- Demo: https://latentforge.github.io/VoiceStudio
- Documentation: https://latentforge.github.io/VoiceStudio
- Models: HuggingFace Hub
📞 Contact
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: contact@latentforge.org
Made with ❤️ by LatentForge Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicestudio-1.0.1.tar.gz.
File metadata
- Download URL: voicestudio-1.0.1.tar.gz
- Upload date:
- Size: 5.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42e50ed9b414dbd968b8cfdba0820ed01345e1b0e997d01d15da8b3f0fa05e8e
|
|
| MD5 |
86bdd06b90465750a892f827b372f4e5
|
|
| BLAKE2b-256 |
e3183cb61cd4258bb89b2024b6c33634af1e1bb87bcd7e6b9330e3a02ae222c7
|
File details
Details for the file voicestudio-1.0.1-py3-none-any.whl.
File metadata
- Download URL: voicestudio-1.0.1-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6267550cf5f238facf89c28a0f4a2f1d5488e2b5a6f5691bd58325d2ec2f2d56
|
|
| MD5 |
87fcf7509eda1df4dab4e28b7d3dfccc
|
|
| BLAKE2b-256 |
5ddc10a264143a4eab72a9db1d42e417525792fd77c5d5806e4bcc578fa37088
|