Skip to main content

A simple transformer-based text-to-speech library

Project description

Arthemis TTS

A simple and efficient transformer-based text-to-speech library for Python.

Overview

Arthemis TTS is a PyPI package that provides an easy-to-use interface for interacting with arthemis-TTS model

Features

  • Simple API: Easy-to-use functions for text-to-speech conversion
  • Pretrained Models: Use ready-to-go pretrained models
  • GPU Support: Automatic GPU acceleration when available
  • Multiple Output Formats: Support for various audio formats (WAV, MP3, etc.)
  • Lightweight: Minimal dependencies and efficient implementation works on CPU

Installation

From PyPI (Recommended)

pip install arthemis-tts

Using Pretrained Models

Basic Usage

import arthemis_tts

# Simple text-to-speech with pretrained model (tested example)
model_path = "your_model.pt"
audio = arthemis_tts.text_to_speech("Hello, world!", model_path=model_path)

# Save to file
arthemis_tts.text_to_speech("Hello, world!", 
                           model_path=model_path,
                           output_path="hello_world.wav")

Advanced Usage

from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load a pretrained model (tested with actual model)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
model = load_model(model_path)

# Generate speech
audio = model.synthesize("This is a test of the synthesize function.")

# Save the audio
write_audio_to_file(audio, "synthesized_speech.wav")

Step-by-Step Usage (Complete Example)

import torch
from arthemis_tts import ArthemisTTS
from arthemis_tts.text_processing import text_to_sequence
from arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file
from arthemis_tts.utils import get_device

# 1. Load your pretrained model (tested example)
model_path = "your_model.pt"
device = get_device()  # Automatically detects best device
print(f"Using device: {device}")

model = ArthemisTTS(device=device)
state_dict = torch.load(model_path, map_location=device)

# Handle different state dict formats
if isinstance(state_dict, dict) and "model" in state_dict:
    model.load_state_dict(state_dict["model"])
else:
    model.load_state_dict(state_dict)
model.eval()

# 2. Convert text to sequence
text = "Hello, world!"
text_seq = text_to_sequence(text).unsqueeze(0).to(device)

# 3. Generate mel spectrogram
with torch.no_grad():
    mel_postnet, gate_outputs = model.inference(
        text_seq,
        max_length=100,  # Shorter for faster generation
        stop_token_threshold=0.5,
        with_tqdm=True  # Show progress bar
    )

# 4. Convert to audio
audio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)

# 5. Save audio file
write_audio_to_file(audio, "step_by_step_output.wav")
print(f"Generated audio shape: {audio.shape}")

API Reference

Main Functions

text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)

Convert text to speech using a pretrained model.

Parameters:

  • text (str): Input text to synthesize
  • model_path (str): Path to pretrained model file (required)
  • output_path (str, optional): Path to save audio file
  • max_length (int): Maximum generation length (default: 800)
  • gate_threshold (float): Stop token threshold (default: 0.5)

Returns:

  • torch.Tensor or None: Audio tensor if no output_path, None if saved to file

load_model(model_path)

Load a pretrained model.

Parameters:

  • model_path (str): Path to pretrained model file

Returns:

  • ArthemisTTS: Loaded model instance

Classes

ArthemisTTS

Main TTS model class for using pretrained models.

Methods:

  • inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True): Generate mel spectrogram
  • synthesize(text, max_length=800, stop_token_threshold=0.5): High-level synthesis function

Supported Audio Formats

  • WAV (recommended)
  • MP3

Requirements

  • Python >= 3.7
  • PyTorch >= 1.9.0
  • torchaudio >= 0.9.0
  • NumPy >= 1.19.0
  • pandas >= 1.2.0
  • tqdm >= 4.60.0
  • pydub >= 0.25.0 (for MP3 support)

Performance Notes

  • GPU Acceleration: The model will automatically use CUDA if available
  • Memory Usage: Adjust max_length parameter based on available memory
  • Generation Speed: Depends on text length and hardware capabilities

Model Requirements

  • Models should be saved as PyTorch state dictionaries (.pt files)
  • Compatible with the transformer architecture used in this library
  • Models trained on the LJ Speech dataset work best for English text

Examples

Batch Processing with Pretrained Model

import arthemis_tts

# Path to your pretrained model (tested example)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"

texts = [
    "Hello, world!",
    "This is Arthemis TTS.",
    "Text-to-speech synthesis."
]

for i, text in enumerate(texts):
    arthemis_tts.text_to_speech(
        text, 
        model_path=model_path,
        output_path=f"batch_output_{i+1}.wav"
    )
    print(f"Generated audio {i+1}: {text}")

Efficient Multiple Generation (Load Once, Use Many Times)

from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load model once (tested with actual model)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
model = load_model(model_path)

texts = [
    "Hello, this is the first sentence.",
    "This is the second sentence.",
    "And this is the third sentence."
]

# Generate multiple times without reloading model
for i, text in enumerate(texts):
    audio = model.synthesize(text)
    write_audio_to_file(audio, f"efficient_output_{i+1}.wav")
    print(f"Generated efficient audio {i+1}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For questions and support, please open an issue on GitHub or Huggingface.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arthemis_tts-0.1.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arthemis_tts-0.1.1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file arthemis_tts-0.1.1.tar.gz.

File metadata

  • Download URL: arthemis_tts-0.1.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arthemis_tts-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a479ea2c23eaddfd3b036041f0010c9f7cfd2d641f1cc1dfe04b009be303bbdc
MD5 1f211e0ec95a43a7ee50e48e534d2ba6
BLAKE2b-256 0b74460338df85d663b7a6084c54db8a2cf48452828c13b563c6a2c2771a0eb0

See more details on using hashes here.

File details

Details for the file arthemis_tts-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: arthemis_tts-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arthemis_tts-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f8c421be5b8a77a015958be826d02188b84f9617381b7a07a0eb881322a9b62f
MD5 410bb92135bda6275c9f516f2104fb17
BLAKE2b-256 c9e246c7a665e5b83e5631b398ab99f0e15592e120ef8ef84207d90fa524ce95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page