Skip to main content

VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice adaptation, and editing

Project description

VoiceStudio

Python PyTorch License

Your Complete Voice Adaptation Research Workspace


🎯 Overview

VoiceStudio is a unified toolkit for text-style prompted speech synthesis, enabling instant voice adaptation and editing through natural language descriptions. Built on cutting-edge research in voice style prompting, LoRA adaptation, and language-audio models.

Key Features:

  • Text-Conditional Generation: Generate voice characteristics using natural language descriptions like "young female voice with warm tone"
  • Multimodal Input: Support both text descriptions and audio feature vectors
  • Voice Editing: Modify existing voices with simple instructions (Future Work)
  • Instant Adaptation: Generate LoRA weights in a single forward pass without fine-tuning
  • Architecture Agnostic: Works with multiple TTS architectures
  • Zero-shot Generalization: Adapt to unseen voice characteristics not present in training data
  • Parameter Efficiency: Minimal computational overhead compared to full model fine-tuning

🛠️ Installation

From PyPI (Recommended)

uv add voicestudio[all]  # Install with all available base TTS models

From Source

git clone https://github.com/LatentForge/voicestudio.git
cd voicestudio
uv pip install -e ".[all]"

Development Installation

git clone https://github.com/LatentForge/voicestudio.git
cd voicestudio
uv pip install -e ".[all,web]"

Building and Publishing

# Build package
uv build

# Upload to PyPI
uv publish

📊 Supported Models

VoiceStudio works with various TTS architectures:

Model Status Notes
Parler-TTS ✅ Supported Required further testing
Higgs-Audio ✅ Supported Required further testing
Qwen3-TTS ✅ Supported Required further testing
Chroma ✅ Supported Required further testing
Spark 🔄 Experimental Coming soon
Dia ✅ Supported Fully tested (by HF)
CozyVoice 🔄 Experimental Coming soon
F5-TTS 🔄 Experimental Coming soon

Add your own model: See our Integration Guide


🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas we need help with:

  • 🔧 Additional TTS model adapters
  • 📚 Documentation improvements
  • 🐛 Bug fixes and testing
  • 🌍 Multi-language support
  • 🎨 New voice editing techniques

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

The base TTS models supported by this project are subject to their own respective licenses. Users are responsible for reviewing and complying with each model’s license before use.


🙏 Acknowledgments

  • Sakana AI for the original Text-to-LoRA concept
  • HyperTTS authors for hypernetwork applications in TTS
  • The open-source community for tools and datasets
  • CLAP: Microsoft & LAION-AI for CLAP model
  • LoRA: Microsoft for LoRA technique
  • HuggingFace: For transformers library and model hub

📚 Citation

If you use VoiceStudio in your research, please cite:

@software{voicestudio2026,
  title={VoiceStudio: A Unified Toolkit for Voice Style Adaptation},
  author={Your Name},
  year={2026},
  url={https://github.com/LatentForge/voicestudio}
}
@article{t2a-lora-2025,
  title={T2A-LoRA: Text-to-Audio LoRA Generation via Hypernetworks for Real-time Voice Adaptation},
  author={LatentForge},
  journal={arXiv preprint arXiv:2501.XXXXX},
  year={2025}
}

🔗 Links


📞 Contact


Stars Forks Watchers

Made with ❤️ by LatentForge Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicestudio-1.0.1.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicestudio-1.0.1-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file voicestudio-1.0.1.tar.gz.

File metadata

  • Download URL: voicestudio-1.0.1.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for voicestudio-1.0.1.tar.gz
Algorithm Hash digest
SHA256 42e50ed9b414dbd968b8cfdba0820ed01345e1b0e997d01d15da8b3f0fa05e8e
MD5 86bdd06b90465750a892f827b372f4e5
BLAKE2b-256 e3183cb61cd4258bb89b2024b6c33634af1e1bb87bcd7e6b9330e3a02ae222c7

See more details on using hashes here.

File details

Details for the file voicestudio-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: voicestudio-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for voicestudio-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6267550cf5f238facf89c28a0f4a2f1d5488e2b5a6f5691bd58325d2ec2f2d56
MD5 87fcf7509eda1df4dab4e28b7d3dfccc
BLAKE2b-256 5ddc10a264143a4eab72a9db1d42e417525792fd77c5d5806e4bcc578fa37088

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page