A simple transformer-based text-to-speech library

These details have not been verified by PyPI

Project links

Project description

Arthemis TTS

A simple and efficient transformer-based text-to-speech library for Python.

Overview

Arthemis TTS is a PyPI package that provides an easy-to-use interface for interacting with arthemis-TTS model

Features

Simple API: Easy-to-use functions for text-to-speech conversion
Pretrained Models: Use ready-to-go pretrained models
GPU Support: Automatic GPU acceleration when available
Multiple Output Formats: Support for various audio formats (WAV, MP3, etc.)
Lightweight: Minimal dependencies and efficient implementation works on CPU

Installation

From PyPI (Recommended)

pip install arthemis-tts

Using Pretrained Models

Basic Usage

import arthemis_tts

# Simple text-to-speech with pretrained model (tested example)
model_path = "your_model.pt"
audio = arthemis_tts.text_to_speech("Hello, world!", model_path=model_path)

# Save to file
arthemis_tts.text_to_speech("Hello, world!", 
                           model_path=model_path,
                           output_path="hello_world.wav")

Advanced Usage

from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load a pretrained model (tested with actual model)
model_path = "your_model.pt"
model = load_model(model_path)

# Generate speech
audio = model.synthesize("This is a test of the synthesize function.")

# Save the audio
write_audio_to_file(audio, "synthesized_speech.wav")

Step-by-Step Usage (Complete Example)

import torch
from arthemis_tts import ArthemisTTS
from arthemis_tts.text_processing import text_to_sequence
from arthemis_tts.audio_processing import inverse_mel_spec_to_wav, write_audio_to_file
from arthemis_tts.utils import get_device

# 1. Load your pretrained model (tested example)
model_path = "your_model.pt"
device = get_device()  # Automatically detects best device
print(f"Using device: {device}")

model = ArthemisTTS(device=device)
state_dict = torch.load(model_path, map_location=device)

# Handle different state dict formats
if isinstance(state_dict, dict) and "model" in state_dict:
    model.load_state_dict(state_dict["model"])
else:
    model.load_state_dict(state_dict)
model.eval()

# 2. Convert text to sequence
text = "Hello, world!"
text_seq = text_to_sequence(text).unsqueeze(0).to(device)

# 3. Generate mel spectrogram
with torch.no_grad():
    mel_postnet, gate_outputs = model.inference(
        text_seq,
        max_length=100,  # Shorter for faster generation
        stop_token_threshold=0.5,
        with_tqdm=True  # Show progress bar
    )

# 4. Convert to audio
audio = inverse_mel_spec_to_wav(mel_postnet.detach()[0].T)

# 5. Save audio file
write_audio_to_file(audio, "step_by_step_output.wav")
print(f"Generated audio shape: {audio.shape}")

API Reference

Main Functions

`text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`

Convert text to speech using a pretrained model.

Parameters:

text (str): Input text to synthesize
model_path (str): Path to pretrained model file (required)
output_path (str, optional): Path to save audio file
max_length (int): Maximum generation length (default: 800)
gate_threshold (float): Stop token threshold (default: 0.5)

Returns:

torch.Tensor or None: Audio tensor if no output_path, None if saved to file

`load_model(model_path)`

Load a pretrained model.

Parameters:

model_path (str): Path to pretrained model file

Returns:

ArthemisTTS: Loaded model instance

Classes

`ArthemisTTS`

Main TTS model class for using pretrained models.

Methods:

inference(text_tensor, max_length=800, stop_token_threshold=0.5, with_tqdm=True): Generate mel spectrogram
synthesize(text, max_length=800, stop_token_threshold=0.5): High-level synthesis function

Supported Audio Formats

WAV (recommended)
MP3

Requirements

Python >= 3.7
PyTorch >= 1.9.0
torchaudio >= 0.9.0
NumPy >= 1.19.0
pandas >= 1.2.0
tqdm >= 4.60.0
pydub >= 0.25.0 (for MP3 support)

Performance Notes

GPU Acceleration: The model will automatically use CUDA if available
Memory Usage: Adjust max_length parameter based on available memory
Generation Speed: Depends on text length and hardware capabilities

Model Requirements

Models should be saved as PyTorch state dictionaries (.pt files)
Compatible with the transformer architecture used in this library
Models trained on the LJ Speech dataset work best for English text

Examples

Batch Processing with Pretrained Model

import arthemis_tts

# Path to your pretrained model (tested example)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"

texts = [
    "Hello, world!",
    "This is Arthemis TTS.",
    "Text-to-speech synthesis."
]

for i, text in enumerate(texts):
    arthemis_tts.text_to_speech(
        text, 
        model_path=model_path,
        output_path=f"batch_output_{i+1}.wav"
    )
    print(f"Generated audio {i+1}: {text}")

Efficient Multiple Generation (Load Once, Use Many Times)

from arthemis_tts import load_model
from arthemis_tts.audio_processing import write_audio_to_file

# Load model once (tested with actual model)
model_path = r"C:\Users\haris\Downloads\train_ArthemisTTS (7).pt"
model = load_model(model_path)

texts = [
    "Hello, this is the first sentence.",
    "This is the second sentence.",
    "And this is the third sentence."
]

# Generate multiple times without reloading model
for i, text in enumerate(texts):
    audio = model.synthesize(text)
    write_audio_to_file(audio, f"efficient_output_{i+1}.wav")
    print(f"Generated efficient audio {i+1}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on the Neural Speech Synthesis with Transformer Network paper
Inspired by the original SimpleTransformerTTS implementation
Uses PyTorch and torchaudio for audio processing

Support

For questions and support, please open an issue on GitHub or Huggingface.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Sep 9, 2025

0.1.1

Sep 9, 2025

0.1.0

Sep 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arthemis_tts-0.1.2.tar.gz (15.0 kB view details)

Uploaded Sep 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arthemis_tts-0.1.2-py3-none-any.whl (13.9 kB view details)

Uploaded Sep 9, 2025 Python 3

File details

Details for the file arthemis_tts-0.1.2.tar.gz.

File metadata

Download URL: arthemis_tts-0.1.2.tar.gz
Upload date: Sep 9, 2025
Size: 15.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arthemis_tts-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`ae676f14eaed9a6288c8266e0e4d1e1a6a419e8482b3da4db00f1a7cf63d0860`
MD5	`f839fc282ee798e1b14185e97b2c57ef`
BLAKE2b-256	`aa7a4906688fce955e15141ffdbba3a7cc137f7731842f778ea256fac11516fe`

See more details on using hashes here.

File details

Details for the file arthemis_tts-0.1.2-py3-none-any.whl.

File metadata

Download URL: arthemis_tts-0.1.2-py3-none-any.whl
Upload date: Sep 9, 2025
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arthemis_tts-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab3aef97bd40cd16dc8c3e2b5f835666a1a756f14d9e960d0d82be3c53072092`
MD5	`cbe65c712fef4130569b66dc28a84c1f`
BLAKE2b-256	`0df0e80367aeb048e23a7fee10495bd5953f8035a0b48ae6dfc411b9e0a88108`

See more details on using hashes here.

arthemis-tts 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Arthemis TTS

Overview

Features

Installation

From PyPI (Recommended)

Using Pretrained Models

Basic Usage

Advanced Usage

Step-by-Step Usage (Complete Example)

API Reference

Main Functions

text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)

load_model(model_path)

Classes

ArthemisTTS

Supported Audio Formats

Requirements

Performance Notes

Model Requirements

Examples

Batch Processing with Pretrained Model

Efficient Multiple Generation (Load Once, Use Many Times)

Contributing

License

Acknowledgments

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`text_to_speech(text, model_path=None, output_path=None, max_length=800, gate_threshold=0.5)`

`load_model(model_path)`

`ArthemisTTS`