Skip to main content

Facade for voice cloning and speech synthesis

Project description

voxy

Facade for voice cloning and speech synthesis

To install: pip install voxy

Voxy is a flexible Python module for speech synthesis and voice cloning, with initial support for the Sesame CSM-1B model. It provides a plugin architecture that can be extended to support other models in the future.

Features

  • Voice cloning from audio samples
  • High-quality speech synthesis
  • Flexible input formats (file paths, bytes, streams, tensors)
  • Audio cleanup utilities
  • Automatic audio transcription (using Whisper)
  • Plugin architecture for different speech models

Installation

Prerequisites

  • Python 3.10+
  • PyTorch and TorchAudio
  • CUDA-compatible GPU (recommended)
  • FFmpeg for audio processing

Install the CSM Model

The intention is to make voxy into a plugin-enabled facade, where you can chose your own engine (for voice cloning, voice synthesis, etc.). But for now, we just support, what seems to be the best open-source model out there (at the time of writing this): Sesame AI Lab's CSM model. It's just that, well, they did an amazing job at the model, but a terrible one (so far) for the python interface -- which is what inspired me to develop voxy in the first place.

Follow the instructions in the CSM repository to install the CSM model and its dependencies.

Quick Start

Basic Usage

from voxy import create_speech_model

# Create a speech model
model = create_speech_model(model_type="csm")

# Generate speech with default voice
audio = model.generate_speech(
    text="Hello, this is a test of the CSM speech model.", 
    output_path="output.wav"
)

Voice Cloning

from voxy import create_speech_model

# Create a speech model
model = create_speech_model(model_type="csm")

# Clone a voice from an audio file
voice_profile = model.clone_voice(
    audio_input="sample_voice.wav",
    transcript="This is a sample of my voice for cloning purposes."
)

# Generate speech with the cloned voice
audio = model.generate_speech(
    text="This is my cloned voice speaking. Isn't it amazing?",
    voice_profile=voice_profile,
    output_path="cloned_voice.wav"
)

Automatic Transcription

from voxy import create_speech_model

# Create a speech model
model = create_speech_model(model_type="csm")

# Clone a voice with automatic transcription
voice_profile = model.clone_voice(
    audio_input="sample_voice.wav",
    # No transcript provided, will use automatic transcription
)

# Generate speech with the cloned voice
audio = model.generate_speech(
    text="This voice was cloned using automatic transcription.",
    voice_profile=voice_profile,
    output_path="auto_transcribed_voice.wav"
)

Flexible Input Formats

The module supports various input formats:

# From file path
voice_profile1 = model.clone_voice(
    audio_input="sample_voice.wav",
    transcript="Text transcript."
)

# From bytes
with open("sample_voice.wav", "rb") as f:
    audio_bytes = f.read()
voice_profile2 = model.clone_voice(
    audio_input=audio_bytes,
    transcript="Text transcript."
)

# From file object
with open("sample_voice.wav", "rb") as f:
    voice_profile3 = model.clone_voice(
        audio_input=f,
        transcript="Text transcript."
    )

# From tensor
import torch
import torchaudio
audio_tensor, sample_rate = torchaudio.load("sample_voice.wav")
voice_profile4 = model.clone_voice(
    audio_input=audio_tensor,
    transcript="Text transcript."
)

Configuration

You can configure the default device by setting the DFLT_VOXY_DEVICE environment variable:

# Use CUDA
export DFLT_VOXY_DEVICE=cuda

# Use CPU
export DFLT_VOXY_DEVICE=cpu

# Use MPS (Apple Silicon)
export DFLT_VOXY_DEVICE=mps

Advanced Usage

Audio Cleanup

The module includes an audio cleanup function that normalizes volume and removes silence:

from voxy import cleanup_audio
import torchaudio

# Load audio
audio, sample_rate = torchaudio.load("noisy_audio.wav")

# Clean up audio
cleaned_audio = cleanup_audio(
    audio=audio,
    sample_rate=sample_rate,
    normalize=True,
    remove_silence=True,
    silence_threshold=0.02,
    min_silence_duration=0.2
)

# Save cleaned audio
torchaudio.save("cleaned_audio.wav", cleaned_audio, sample_rate)

Disabling Audio Cleanup

You can disable audio cleanup when cloning a voice:

voice_profile = model.clone_voice(
    audio_input="sample_voice.wav",
    transcript="This is a sample of my voice.",
    cleanup_audio_fn=None  # Disable audio cleanup
)

Custom Audio Cleanup

You can also provide your own audio cleanup function:

def my_custom_cleanup(audio, sample_rate, **kwargs):
    # Custom cleanup logic
    return processed_audio

voice_profile = model.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxy-0.0.3.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxy-0.0.3-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file voxy-0.0.3.tar.gz.

File metadata

  • Download URL: voxy-0.0.3.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for voxy-0.0.3.tar.gz
Algorithm Hash digest
SHA256 22d8af9e4140ccad2ab98069f2f838d00f0f383f292dcbc30bdf4cc77d25bac0
MD5 880eff435dbf983e799cc87be3ed4702
BLAKE2b-256 66d977e70dac673469fa351c73e4edc2518cf7afb7f7044dcbf04161f91fa767

See more details on using hashes here.

File details

Details for the file voxy-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: voxy-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for voxy-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ab8dfb35435b31f249cfe6e2b5f5c897fa7bbe1d01ff89795021536c71ccf8ad
MD5 455c8ad58a8ceef33fbd831e68818db9
BLAKE2b-256 1758e40bc36b10bbf2448bd8369106358aa53eb58b86ec2e0e4a7ba52aa7ee22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page