Skip to main content

VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice adaptation, and editing

Project description

VoiceStudio

Your Complete Voice Adaptation Workspace

PyPI version License: MIT Python 3.11+ Documentation

Installation | Quick Start | Documentation | Papers


🎯 Overview

VoiceStudio is a unified toolkit for text-style prompted speech synthesis, enabling instant voice adaptation and editing through natural language descriptions. Built on cutting-edge research in voice style prompting, LoRA adaptation, and language-audio models.

Key Features:

  • 🎨 Text-Style Prompting: Control voice characteristics with natural language
  • Instant Adaptation: Real-time LoRA generation for any TTS model
  • ✂️ Voice Editing: Modify existing voices with simple instructions
  • 🔧 Architecture Agnostic: Works with multiple TTS architectures
  • 🚀 Production Ready: Optimized for both research and deployment

🆕 What's New

v0.1.0 (2025)

  • 🔍 Speaker consistency analysis tools
  • 🎨 BOS token P-tuning
  • 📊 Attention visualization

🚀 Installation

From PyPI (Recommended)

uv add voicestudio[all]

From Source

uv add git+https://github.com/LatentForge/voicestudio.git

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA 11.8+ (for GPU acceleration)

📚 Advanced Usage

Custom TTS Model Integration

VoiceStudio supports any TTS model through a simple adapter interface:

from voicestudio import TTSAdapter, LoRAGenerator

# Wrap your TTS model
class MyTTSAdapter(TTSAdapter):
    def __init__(self, model):
        self.model = model
    
    def get_lora_target_modules(self):
        return ["attention.q_proj", "attention.v_proj"]
    
    def forward(self, text, lora_weights=None):
        if lora_weights:
            self.apply_lora(lora_weights)
        return self.model(text)

# Use with VoiceStudio
adapter = MyTTSAdapter(my_tts_model)
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")

lora = generator("professional news anchor voice")
audio = adapter(text="Breaking news tonight...", lora_weights=lora)

Multi-Speaker Voice Blending

from voicestudio import VoiceBlender

blender = VoiceBlender()

# Blend multiple voice characteristics
blended_lora = blender.blend([
    ("warm and friendly", 0.6),
    ("professional and clear", 0.4)
])

audio = tts_model.synthesize(text, lora=blended_lora)

Fine-tuning on Custom Data

from voicestudio import LoRAGenerator
from voicestudio.training import Trainer

# Load pre-trained generator
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")

# Fine-tune on your data
trainer = Trainer(
    model=generator,
    train_dataset=your_dataset,
    output_dir="./checkpoints"
)

trainer.train()

📊 Supported Models

VoiceStudio works with various TTS architectures:

Model Status Notes
VITS ✅ Supported Fully tested
FastSpeech2 ✅ Supported Fully tested
Tacotron2 ✅ Supported Requires adapter
VALL-E 🔄 Experimental Work in progress
Bark 🔄 Experimental Coming soon
YourTTS ✅ Supported Community contributed

Add your own model: See our Integration Guide


@inproceedings{voicestudio2027lam,
  title={T2A-LoRA2: Text-Guided Voice Editing with Language-Audio Models},
  author={Your Name},
  booktitle={ICML},
  year={2027}
}


🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas we need help with:

  • 🔧 Additional TTS model adapters
  • 📚 Documentation improvements
  • 🐛 Bug fixes and testing
  • 🌍 Multi-language support
  • 🎨 New voice editing techniques

📝 License

This project is licensed under the MIT License - see LICENSE file for details.


🙏 Acknowledgments

  • CLAP: Microsoft & LAION-AI for CLAP model
  • LoRA: Microsoft for LoRA technique
  • HuggingFace: For transformers library and model hub
  • LatentForge Team: For research support and infrastructure

🌟 Citation

If you use VoiceStudio in your research, please cite:

@software{voicestudio2026,
  title={VoiceStudio: A Unified Toolkit for Voice Style Adaptation},
  author={Your Name},
  year={2026},
  url={https://github.com/LatentForge/voicestudio}
}

Made with ❤️ by the LatentForge Team

⭐ Star us on GitHub | 📖 Read the Docs | 🤗 HuggingFace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicestudio-1.0.0.tar.gz (632.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicestudio-1.0.0-py3-none-any.whl (62.1 kB view details)

Uploaded Python 3

File details

Details for the file voicestudio-1.0.0.tar.gz.

File metadata

  • Download URL: voicestudio-1.0.0.tar.gz
  • Upload date:
  • Size: 632.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for voicestudio-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3add0e908c166052769b9f705b0635b328935804cf8b335f9670d678b5aa0ca9
MD5 7802229930e86750c4068613fb733a67
BLAKE2b-256 54852124fccd3aa3a5a44abc6f77c6eeaa75d21dc4cc941ebbec684b3bb01794

See more details on using hashes here.

File details

Details for the file voicestudio-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for voicestudio-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3f3975baf091f9722b37d56843449a988ec823fcea849af50184327a228dacc
MD5 6f3ba3a290d30f222e5345541836ef23
BLAKE2b-256 0a96b1b13fba7e251b2018b71f77cee0ab28cd1ae1ddb261debe92cb788af3c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page