VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice adaptation, and editing

These details have not been verified by PyPI

Project links

Project description

VoiceStudio

Your Complete Voice Adaptation Workspace

Installation | Quick Start | Documentation | Papers

🎯 Overview

VoiceStudio is a unified toolkit for text-style prompted speech synthesis, enabling instant voice adaptation and editing through natural language descriptions. Built on cutting-edge research in voice style prompting, LoRA adaptation, and language-audio models.

Key Features:

🎨 Text-Style Prompting: Control voice characteristics with natural language
⚡ Instant Adaptation: Real-time LoRA generation for any TTS model
✂️ Voice Editing: Modify existing voices with simple instructions
🔧 Architecture Agnostic: Works with multiple TTS architectures
🚀 Production Ready: Optimized for both research and deployment

🆕 What's New

v0.1.0 (2025)

🔍 Speaker consistency analysis tools
🎨 BOS token P-tuning
📊 Attention visualization

🚀 Installation

From PyPI (Recommended)

uv add voicestudio[all]

From Source

uv add git+https://github.com/LatentForge/voicestudio.git

Requirements

Python 3.8+
PyTorch 2.0+
CUDA 11.8+ (for GPU acceleration)

📚 Advanced Usage

Custom TTS Model Integration

VoiceStudio supports any TTS model through a simple adapter interface:

from voicestudio import TTSAdapter, LoRAGenerator

# Wrap your TTS model
class MyTTSAdapter(TTSAdapter):
    def __init__(self, model):
        self.model = model
    
    def get_lora_target_modules(self):
        return ["attention.q_proj", "attention.v_proj"]
    
    def forward(self, text, lora_weights=None):
        if lora_weights:
            self.apply_lora(lora_weights)
        return self.model(text)

# Use with VoiceStudio
adapter = MyTTSAdapter(my_tts_model)
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")

lora = generator("professional news anchor voice")
audio = adapter(text="Breaking news tonight...", lora_weights=lora)

Multi-Speaker Voice Blending

from voicestudio import VoiceBlender

blender = VoiceBlender()

# Blend multiple voice characteristics
blended_lora = blender.blend([
    ("warm and friendly", 0.6),
    ("professional and clear", 0.4)
])

audio = tts_model.synthesize(text, lora=blended_lora)

Fine-tuning on Custom Data

from voicestudio import LoRAGenerator
from voicestudio.training import Trainer

# Load pre-trained generator
generator = LoRAGenerator.from_pretrained("voicestudio/t2a-lora-base")

# Fine-tune on your data
trainer = Trainer(
    model=generator,
    train_dataset=your_dataset,
    output_dir="./checkpoints"
)

trainer.train()

📊 Supported Models

VoiceStudio works with various TTS architectures:

Model	Status	Notes
VITS	✅ Supported	Fully tested
FastSpeech2	✅ Supported	Fully tested
Tacotron2	✅ Supported	Requires adapter
VALL-E	🔄 Experimental	Work in progress
Bark	🔄 Experimental	Coming soon
YourTTS	✅ Supported	Community contributed

Add your own model: See our Integration Guide

@inproceedings{voicestudio2027lam,
  title={T2A-LoRA2: Text-Guided Voice Editing with Language-Audio Models},
  author={Your Name},
  booktitle={ICML},
  year={2027}
}

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas we need help with:

🔧 Additional TTS model adapters
📚 Documentation improvements
🐛 Bug fixes and testing
🌍 Multi-language support
🎨 New voice editing techniques

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

CLAP: Microsoft & LAION-AI for CLAP model
LoRA: Microsoft for LoRA technique
HuggingFace: For transformers library and model hub
LatentForge Team: For research support and infrastructure

🌟 Citation

If you use VoiceStudio in your research, please cite:

@software{voicestudio2026,
  title={VoiceStudio: A Unified Toolkit for Voice Style Adaptation},
  author={Your Name},
  year={2026},
  url={https://github.com/LatentForge/voicestudio}
}

Made with ❤️ by the LatentForge Team

⭐ Star us on GitHub | 📖 Read the Docs | 🤗 HuggingFace

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Feb 15, 2026

This version

1.0.0

Dec 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicestudio-1.0.0.tar.gz (632.4 kB view details)

Uploaded Dec 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voicestudio-1.0.0-py3-none-any.whl (62.1 kB view details)

Uploaded Dec 21, 2025 Python 3

File details

Details for the file voicestudio-1.0.0.tar.gz.

File metadata

Download URL: voicestudio-1.0.0.tar.gz
Upload date: Dec 21, 2025
Size: 632.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for voicestudio-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3add0e908c166052769b9f705b0635b328935804cf8b335f9670d678b5aa0ca9`
MD5	`7802229930e86750c4068613fb733a67`
BLAKE2b-256	`54852124fccd3aa3a5a44abc6f77c6eeaa75d21dc4cc941ebbec684b3bb01794`

See more details on using hashes here.

File details

Details for the file voicestudio-1.0.0-py3-none-any.whl.

File metadata

Download URL: voicestudio-1.0.0-py3-none-any.whl
Upload date: Dec 21, 2025
Size: 62.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for voicestudio-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e3f3975baf091f9722b37d56843449a988ec823fcea849af50184327a228dacc`
MD5	`6f3ba3a290d30f222e5345541836ef23`
BLAKE2b-256	`0a96b1b13fba7e251b2018b71f77cee0ab28cd1ae1ddb261debe92cb788af3c9`

See more details on using hashes here.

voicestudio 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VoiceStudio

🎯 Overview

🆕 What's New

🚀 Installation

From PyPI (Recommended)

From Source

Requirements

📚 Advanced Usage

Custom TTS Model Integration

Multi-Speaker Voice Blending

Fine-tuning on Custom Data

📊 Supported Models

🤝 Contributing

📝 License

🙏 Acknowledgments

🌟 Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes