This is a faster implementation for TTS models, to be used in highly async environment

These details have not been verified by PyPI

Project links

Homepage

Project description

Auralis 🌌 (/auˈralis/)

Transform text into natural speech (with voice cloning) at warp speed. Process an entire novel in minutes, not hours.

What is Auralis? 🚀

Auralis is a text-to-speech engine that makes voice generation practical for real-world use:

Convert the entire first Harry Potter book to speech in 10 minutes (realtime factor of ≈ 0.02x! )
Automatically enhance the reference quality, you can register them even with a low quality mic!
It can be configured to have a small memory footprint (scheduler_max_concurrency)
Process multiple requests simultaneously
Stream long texts piece by piece

Quick Start ⭐

Create a new Conda environment:

conda create -n auralis_env python=3.10 -y

Activate the environment:
```
conda activate auralis_env
```
Install Auralis:
```
pip install auralis
```

and then you can try it out via python

from auralis import TTS, TTSRequest

# Initialize
tts = TTS().from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')

# Generate speech
request = TTSRequest(
    text="Hello Earth! This is Auralis speaking.",
    speaker_files=['reference.wav']
)

output = tts.generate_speech(request)
output.save('hello.wav')

or via cli using the openai compatible server

auralis.openai --host 127.0.0.1 --port 8000 --model AstraMindAI/xttsv2 --gpt_model AstraMindAI/xtts2-gpt --max_concurrency 8 --vllm_logging_level warn

You can see here for a more in-depth explanation or try it out with this example

Key Features 🛸

Speed & Efficiency

Processes long texts rapidly using smart batching
Runs on consumer GPUs without memory issues
Handles multiple requests in parallel

Easy Integration

Simple Python API
Streaming support for long texts
Built-in audio enhancement
Automatic language detection

Audio Quality

Voice cloning from short samples
Background noise reduction
Speech clarity enhancement
Volume normalization

XTTSv2 Finetunes

You can use your own XTTSv2 finetunes by simply converting them from the standard coqui checkpoint format to our safetensor format. Use this script:

python checkpoint_converter.py path/to/checkpoint.pth --output_dir path/to/output

it will create two folders, one with the core xttsv2 checkpoint and one with the gtp2 component. Then create a TTS instance with

tts = TTS().from_pretrained("som/core-xttsv2_model", gpt_model='some/xttsv2-gpt_model')

Examples & Usage 🚀

Basic Examples ⭐

Simple Text Generation

from auralis import TTS, TTSRequest

# Initialize
tts = TTS().from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')
# Basic generation
request = TTSRequest(
    text="Hello Earth! This is Auralis speaking.",
    speaker_files=["speaker.wav"]
)
output = tts.generate_speech(request)
output.save("hello.wav")

Working with TTSRequest 🎤

# Basic request
request = TTSRequest(
    text="Hello world!",
    speaker_files=["speaker.wav"]
)

# Enhanced audio processing
request = TTSRequest(
    text="Pristine audio quality",
    speaker_files=["speaker.wav"],
    audio_config=AudioPreprocessingConfig(
        normalize=True,
        trim_silence=True,
        enhance_speech=True,
        enhance_amount=1.5
    )
)

# Language-specific request
request = TTSRequest(
    text="Bonjour le monde!",
    speaker_files=["speaker.wav"],
    language="fr"
)

# Streaming configuration
request = TTSRequest(
    text="Very long text...",
    speaker_files=["speaker.wav"],
    stream=True,
)

# Generation parameters
request = TTSRequest(
    text="Creative variations",
    speaker_files=["speaker.wav"],
    temperature=0.8,
    top_p=0.9,
    top_k=50
)

Working with TTSOutput 🎧

# Load audio file
output = TTSOutput.from_file("input.wav")

# Format conversion
output.bit_depth = 32
output.channel = 2
tensor_audio = output.to_tensor()
audio_bytes = output.to_bytes()



# Audio processing
resampled = output.resample(target_sr=44100)
faster = output.change_speed(1.5)
num_samples, sample_rate, duration = output.get_info()

# Combine multiple outputs
combined = TTSOutput.combine_outputs([output1, output2, output3])

# Playback and saving
output.play()  # Play audio
output.preview()  # Smart playback (Jupyter/system)
output.save("processed.wav", sample_rate=44100)

Synchronous Advanced Examples 🌟

Batch Text Processing

# Process multiple texts with same voice
texts = ["First paragraph.", "Second paragraph.", "Third paragraph."]
requests = [
    TTSRequest(
        text=text,
        speaker_files=["speaker.wav"]
    ) for text in texts
]

# Sequential processing with progress
outputs = []
for i, req in enumerate(requests, 1):
    print(f"Processing text {i}/{len(requests)}")
    outputs.append(tts.generate_speech(req))

# Combine all outputs
combined = TTSOutput.combine_outputs(outputs)
combined.save("combined_output.wav")

Book Chapter Processing

def process_book(chapter_file: str, speaker_file: str):
    # Read chapter
    with open(chapter_file, 'r') as f:
        chapter = f.read()
    
    # You can pass the whole book, auralis will take care of splitting
    
    request = TTSRequest(
            text=chapter,
            speaker_files=[speaker_file],
            audio_config=AudioPreprocessingConfig(
                enhance_speech=True,
                normalize=True
            )
        )
        
    output = tts.generate_speech(request)
    
    output.play()
    output.save("chapter_output.wav")

Asynchronous Examples 🛸

Basic Async Generation

import asyncio
from auralis import TTS, TTSRequest

async def generate_speech():
    tts = TTS().from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')
    
    request = TTSRequest(
        text="Async generation example",
        speaker_files=["speaker.wav"]
    )
    
    output = await tts.generate_speech_async(request)
    output.save("async_output.wav")

asyncio.run(generate_speech())

Parallel Processing

async def generate_parallel():
    tts = TTS().from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')
    
    # Create multiple requests
    requests = [
        TTSRequest(
            text=f"This is voice {i}",
            speaker_files=[f"speaker_{i}.wav"]
        ) for i in range(3)
    ]
    
    # Process in parallel
    coroutines = [tts.generate_speech_async(req) for req in requests]
    outputs = await asyncio.gather(*coroutines, return_exceptions=True)
    
    # Handle results
    valid_outputs = [
        out for out in outputs 
        if not isinstance(out, Exception)
    ]
    
    combined = TTSOutput.combine_outputs(valid_outputs)
    combined.save("parallel_output.wav")

asyncio.run(generate_parallel())

Async Streaming with Multiple Requests

async def stream_multiple_texts():
    tts = TTS().from_pretrained("AstraMindAI/xttsv2", gpt_model='AstraMindAI/xtts2-gpt')
    
    # Prepare streaming requests
    texts = [
        "First long text...",
        "Second long text...",
        "Third long text..."
    ]
    
    requests = [
        TTSRequest(
            text=text,
            speaker_files=["speaker.wav"],
            stream=True,
        ) for text in texts
    ]
    
    # Process streams in parallel
    coroutines = [tts.generate_speech_async(req) for req in requests]
    streams = await asyncio.gather(*coroutines)
    
    # Collect outputs
    output_container = {i: [] for i in range(len(requests))}
    
    async def process_stream(idx, stream):
        async for chunk in stream:
            output_container[idx].append(chunk)
            print(f"Processed chunk for text {idx+1}")
            
    # Process all streams
    await asyncio.gather(
        *(process_stream(i, stream) 
          for i, stream in enumerate(streams))
    )
    
    # Save results
    for idx, chunks in output_container.items():
        TTSOutput.combine_outputs(chunks).save(
            f"text_{idx}_output.wav"
        )

asyncio.run(stream_multiple_texts())

Core Classes 🌟

TTSRequest - Unified request container with audio enhancement 🎤

@dataclass
class TTSRequest:
    """Container for TTS inference request data"""
    # Request metadata
    text: Union[AsyncGenerator[str, None], str, List[str]]

    speaker_files: Union[List[str], bytes]  # Path to the speaker audio file

    enhance_speech: bool = True
    audio_config: AudioPreprocessingConfig = field(default_factory=AudioPreprocessingConfig)
    language: SupportedLanguages = "auto"
    request_id: str = field(default_factory=lambda: uuid.uuid4().hex)
    load_sample_rate: int = 22050
    sound_norm_refs: bool = False

    # Voice conditioning parameters
    max_ref_length: int = 60
    gpt_cond_len: int = 30
    gpt_cond_chunk_len: int = 4

    # Generation parameters
    stream: bool = False
    temperature: float = 0.75
    top_p: float = 0.85
    top_k: int = 50
    repetition_penalty: float = 5.0
    length_penalty: float = 1.0
    do_sample: bool = True

Examples

# Basic usage
request = TTSRequest(
    text="Hello world!",
    speaker_files=["reference.wav"]
)

# With custom audio enhancement
request = TTSRequest(
    text="Hello world!",
    speaker_files=["reference.wav"],
    audio_config=AudioPreprocessingConfig(
        normalize=True,
        trim_silence=True,
        enhance_speech=True,
        enhance_amount=1.5
    )
)

# Streaming long text
request = TTSRequest(
    text="Very long text...",
    speaker_files=["reference.wav"],
    stream=True,
)

Features

Automatic language detection
Audio preprocessing & enhancement
Flexible input handling (strings, lists, generators)
Configurable generation parameters
Caching for efficient processing

TTSOutput - Unified output container for audio processing 🎧

@dataclass
class TTSOutput:
    array: np.ndarray
    sample_rate: int

Methods

Format Conversion

output.to_tensor()      # → torch.Tensor
output.to_bytes()       # → bytes (wav/raw)
output.from_tensor()    # → TTSOutput
output.from_file()      # → TTSOutput

Audio Processing

output.combine_outputs()  # Combine multiple outputs
output.resample()        # Change sample rate
output.get_info()        # Get audio properties
output.change_speed()    # Modify playback speed

File & Playback

output.save()           # Save to file
output.play()          # Play audio
output.display()       # Show in Jupyter
output.preview()       # Smart playback

Examples

# Load and process
output = TTSOutput.from_file("input.wav")
output = output.resample(target_sr=44100)
output.save("output.wav")

# Combine multiple outputs
combined = TTSOutput.combine_outputs([output1, output2, output3])

# Change playback speed
faster = output.change_speed(1.5)

Languages 🌍

XTTSv2 Supports: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi

Performance Details 📊

Processing speeds on NVIDIA 3090:

Short phrases (< 100 chars): ~1 second
Medium texts (< 1000 chars): ~5-10 seconds
Full books (~500K chars @ concurrency 36): ~10 minutes

Memory usage:

Base: ~2.5GB VRAM concurrency = 1
~ 5.3GB VRAM concurrency = 20

Contributions

Join Our Community!

We welcome and appreciate any contributions to our project! To ensure a smooth and efficient process, please take a moment to review our Contribution Guideline. By following these guidelines, you'll help us review and accept your contribution quickly. Thank you for your support!

Learn More 🔭

License

The codebase is released under Apache 2.0, feel free to use it in your projects.

The XTTSv2 model (and the files under auralis/models/xttsv2/components/tts) are licensed under the Coqui AI License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.8.post2

Dec 16, 2024

0.2.8.post1

Dec 15, 2024

0.2.7.post1

Dec 4, 2024

0.2.7

Dec 4, 2024

0.2.6

Dec 4, 2024

0.2.5

Nov 29, 2024

0.2.4

Nov 28, 2024

0.2.3

Nov 28, 2024

0.2.2

Nov 28, 2024

0.2.1

Nov 28, 2024

0.2.0

Nov 27, 2024

0.1.2

Nov 27, 2024

0.1.1

Nov 27, 2024

0.1.0

Nov 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auralis-0.2.8.post2.tar.gz (118.4 kB view details)

Uploaded Dec 16, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

auralis-0.2.8.post2-py3-none-any.whl (127.3 kB view details)

Uploaded Dec 16, 2024 Python 3

File details

Details for the file auralis-0.2.8.post2.tar.gz.

File metadata

Download URL: auralis-0.2.8.post2.tar.gz
Upload date: Dec 16, 2024
Size: 118.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for auralis-0.2.8.post2.tar.gz
Algorithm	Hash digest
SHA256	`0f24515c3a0a370743f7174ae15218c5dde5406dfa74a2bd6fc1636b5d06537c`
MD5	`757c1eff4afc77b432463adc188d7685`
BLAKE2b-256	`838edba5010946f7c8d4d552abed8d0673d20e42d7da1b9f7c505705ea04be4c`

See more details on using hashes here.

File details

Details for the file auralis-0.2.8.post2-py3-none-any.whl.

File metadata

Download URL: auralis-0.2.8.post2-py3-none-any.whl
Upload date: Dec 16, 2024
Size: 127.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for auralis-0.2.8.post2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42df17d0da09215bc1f9b6ded67e407e50bba17489a0c061bfcc03377d1cfa4b`
MD5	`c32b5fbc33a1089f5ca49f780d4a6483`
BLAKE2b-256	`2a24a82a8c44b248ed31ac5572f5396d7a751ff63ae0eeda59b2a68fd52961f3`

See more details on using hashes here.

auralis 0.2.8.post2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Auralis 🌌 (/auˈralis/)

What is Auralis? 🚀

Quick Start ⭐

Key Features 🛸

Speed & Efficiency

Easy Integration

Audio Quality

XTTSv2 Finetunes

Examples & Usage 🚀

Basic Examples ⭐

Synchronous Advanced Examples 🌟

Asynchronous Examples 🛸

Core Classes 🌟

Examples

Features

Methods

Format Conversion

Audio Processing

File & Playback

Examples

Languages 🌍

Performance Details 📊

Contributions

Learn More 🔭

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes