VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice adaptation, and editing
Project description
VoiceStudio
🎯 Overview
VoiceStudio is a unified toolkit for text-style prompted speech synthesis, enabling instant voice adaptation and editing through natural language descriptions. Built on cutting-edge research in voice style prompting, LoRA adaptation, and language-audio models.
Key Features:
- 🎨 Text-Style Prompting: Control voice characteristics with natural language
- ⚡ Instant Adaptation: Real-time LoRA generation for any TTS model
- ✂️ Voice Editing: Modify existing voices with simple instructions
- 🔧 Architecture Agnostic: Works with multiple TTS architectures
- 🚀 Production Ready: Optimized for both research and deployment
🆕 What's New
v0.1.0 (2025)
- 🔍 Speaker consistency analysis tools
- 🎨 BOS token P-tuning
- 📊 Attention visualization
🚀 Installation
From PyPI (Recommended)
uv add voicestudio[all]
From Source
uv add git+https://github.com/LatentForge/voicestudio.git
Requirements
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
📚 Advanced Usage
Custom TTS Model Integration
VoiceStudio supports any TTS model through a simple adapter interface:
from voicestudio import TTSAdapter, LoRAGenerator
# Wrap your TTS model
class MyTTSAdapter(TTSAdapter):
def __init__(self, model):
self.model = model
def get_lora_target_modules(self):
return ["attention.q_proj", "attention.v_proj"]
def forward(self, text, lora_weights=None):
if lora_weights:
self.apply_lora(lora_weights)
return self.model(text)
# Use with VoiceStudio
adapter = MyTTSAdapter(my_tts_model)
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")
lora = generator("professional news anchor voice")
audio = adapter(text="Breaking news tonight...", lora_weights=lora)
Multi-Speaker Voice Blending
from voicestudio import VoiceBlender
blender = VoiceBlender()
# Blend multiple voice characteristics
blended_lora = blender.blend([
("warm and friendly", 0.6),
("professional and clear", 0.4)
])
audio = tts_model.synthesize(text, lora=blended_lora)
Fine-tuning on Custom Data
from voicestudio import LoRAGenerator
from voicestudio.training import Trainer
# Load pre-trained generator
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")
# Fine-tune on your data
trainer = Trainer(
model=generator,
train_dataset=your_dataset,
output_dir="./checkpoints"
)
trainer.train()
📊 Supported Models
VoiceStudio works with various TTS architectures:
| Model | Status | Notes |
|---|---|---|
| VITS | ✅ Supported | Fully tested |
| FastSpeech2 | ✅ Supported | Fully tested |
| Tacotron2 | ✅ Supported | Requires adapter |
| VALL-E | 🔄 Experimental | Work in progress |
| Bark | 🔄 Experimental | Coming soon |
| YourTTS | ✅ Supported | Community contributed |
Add your own model: See our Integration Guide
@inproceedings{voicestudio2027lam,
title={T2A-LoRA2: Text-Guided Voice Editing with Language-Audio Models},
author={Your Name},
booktitle={ICML},
year={2027}
}
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas we need help with:
- 🔧 Additional TTS model adapters
- 📚 Documentation improvements
- 🐛 Bug fixes and testing
- 🌍 Multi-language support
- 🎨 New voice editing techniques
📝 License
This project is licensed under the MIT License - see LICENSE file for details.
🙏 Acknowledgments
- CLAP: Microsoft & LAION-AI for CLAP model
- LoRA: Microsoft for LoRA technique
- HuggingFace: For transformers library and model hub
- LatentForge Team: For research support and infrastructure
🌟 Citation
If you use VoiceStudio in your research, please cite:
@software{voicestudio2026,
title={VoiceStudio: A Unified Toolkit for Voice Style Adaptation},
author={Your Name},
year={2026},
url={https://github.com/LatentForge/voicestudio}
}
Made with ❤️ by the LatentForge Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicestudio-1.0.0.tar.gz.
File metadata
- Download URL: voicestudio-1.0.0.tar.gz
- Upload date:
- Size: 632.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3add0e908c166052769b9f705b0635b328935804cf8b335f9670d678b5aa0ca9
|
|
| MD5 |
7802229930e86750c4068613fb733a67
|
|
| BLAKE2b-256 |
54852124fccd3aa3a5a44abc6f77c6eeaa75d21dc4cc941ebbec684b3bb01794
|
File details
Details for the file voicestudio-1.0.0-py3-none-any.whl.
File metadata
- Download URL: voicestudio-1.0.0-py3-none-any.whl
- Upload date:
- Size: 62.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3f3975baf091f9722b37d56843449a988ec823fcea849af50184327a228dacc
|
|
| MD5 |
6f3ba3a290d30f222e5345541836ef23
|
|
| BLAKE2b-256 |
0a96b1b13fba7e251b2018b71f77cee0ab28cd1ae1ddb261debe92cb788af3c9
|