Skip to main content

A unified Python library for speech AI — ASR and TTS using open models

Project description

RevoS

Python 3.11+ License: MIT CI

A unified Python library for speech AI — ASR and TTS using open models.

Installation

# Core (ASR support)
pip install revospeech

# With TTS support (RevoVoice — requires PyTorch)
pip install revospeech[tts]

# With GPU support
pip install revospeech[gpu]

# Everything (GPU + TTS)
pip install revospeech[all]

# Or with uv
uv add revospeech

HuggingFace Login (Required for TTS)

Note: The RevoVoice TTS model is hosted on a private HuggingFace repository. You must log in before using TTS.

pip install huggingface-hub
huggingface-cli login

Get your token at https://huggingface.co/settings/tokens

Important Notes

  • revospeech[gpu] and revospeech[all] install onnxruntime-gpu, which conflicts with onnxruntime. If you already have revospeech installed, uninstall it first before installing the GPU variant before installing the GPU variant.
  • Audio formats supported: WAV, FLAC, OGG, and any format supported by libsndfile.

Quick Start

ASR (Automatic Speech Recognition)

from revospeech.asr import ASR

asr = ASR('zipformer-v2')
result = asr.transcribe('meeting.wav')

print(result.text)        # Full transcript
print(result.language)    # Detected language
for seg in result.segments:
    print(f"[{seg.start:.1f}s - {seg.end:.1f}s] {seg.text}")

TTS (Text-to-Speech)

from revospeech.tts import TTS

# Basic synthesis
tts = TTS('revovoice')
audio = tts.synthesize('Hello, how are you?')
audio.save('greeting.wav')

# Voice cloning (with reference audio)
audio = tts.synthesize(
    'This will sound like the reference speaker.',
    ref_audio='speaker.wav',
    ref_text='Sample of the speaker talking.',
)
audio.save('cloned.wav')

Model Discovery

import revospeech

# List all models with status
revospeech.list_models()

# Filter by task, mode, status
revospeech.list_models(task="asr", status="ready")
revospeech.list_models(mode="api")

# Fuzzy search
revospeech.search_models("english fast")

# Check a specific model
status = revospeech.check_model("zipformer-v2")
print(status.is_ready)

CLI

# Transcribe audio
revospeech transcribe -m zipformer-v2 audio.wav

# JSON output
revospeech transcribe -m zipformer-v2 --json audio.wav

# SRT subtitles
revospeech transcribe -m zipformer-v2 --srt audio.wav

# Synthesize speech
revospeech synthesize -m revovoice -t "Hello, world!" -o output.wav

# From text file
revospeech synthesize -m revovoice -f script.txt -o audiobook.wav

# List available models (with status icons)
revospeech models
revospeech models --ready           # Only ready-to-use models
revospeech models --mode api        # Only API models
revospeech models --task asr        # Filter by task

# Detailed model info
revospeech models-info zipformer-v2

# Fuzzy search
revospeech search "english fast"

# Browse remote catalog
revospeech catalog list

# Pull a model from the catalog
revospeech catalog pull revovoice

# API key management
revospeech config set-api-key

# Show environment info
revospeech info

Available Models

Model Task Backend Mode Languages Access Description
zipformer-v2 ASR sherpa-onnx local English Open Zipformer small transducer model
revovoice TTS RevoVoice local 600+ Gated Zero-shot multilingual TTS with voice cloning

Model Directory

revospeech/models/
├── asr/
│   └── zipformer_v2.yaml    # Open — downloads from GitHub releases
└── tts/
    └── revovoice.yaml       # Gated — requires HF login + approval

Gated Model Access

Some models (like revovoice) are hosted on private HuggingFace repositories and require approval before use.

  1. Log in to HuggingFace:

    pip install huggingface-hub
    huggingface-cli login
    

    Get your token at https://huggingface.co/settings/tokens

  2. Request access: Visit the model's HuggingFace page and submit an access request. The repo owner will review and approve.

  3. Use the model: Once approved, the model will download automatically on first use:

    from revospeech.tts import TTS
    tts = TTS('revovoice')  # Will prompt for HF login if not authenticated
    

For team members adding models: If your model is gated, set hf_private: true in the YAML manifest. This tells RevoS to check HF authentication before downloading.

Configuration

API Keys

For cloud API backends, set your API key:

# Option 1: Environment variable
export REVOLAB_API_KEY=rv-your-key-here

# Option 2: CLI command (saves to ~/.config/revospeech/config.yaml)
revospeech config set-api-key

Resolution order: constructor arg > REVOLAB_API_KEY env var > ~/.config/revospeech/config.yaml

Catalog Source

Override the catalog source with:

export REVOS_CATALOG_REPO="myorg/revospeech"    # env var

Or in ~/.config/revospeech/config.yaml:

catalog_repo: "myorg/revospeech"

Adding Custom Models

Add a YAML manifest to ~/.config/revospeech/models/:

# ~/.config/revospeech/models/asr/my-model.yaml
name: my-custom-model
task: asr
mode: local
backend: sherpa-onnx
model_type: transducer
model_url: "https://example.com/models/my-model.tar.bz2"
sample_rate: 16000
language: en
description: "My custom ASR model"
capabilities: ["word-timestamps"]
languages: ["en"]
files:
  encoder: "encoder.onnx"
  decoder: "decoder.onnx"
  joiner: "joiner.onnx"
  tokens: "tokens.txt"

Then use it: from revospeech.asr import ASR; asr = ASR('my-custom-model')

API Models

# ~/.config/revospeech/models/asr/my-api-model.yaml
name: my-api-model
task: asr
mode: api
backend: my-api
api_endpoint: "https://api.example.com/v1"
description: "Cloud ASR"
capabilities: ["streaming"]
languages: ["en"]

Pinning Model Versions

For HuggingFace-hosted models, pin to a specific commit hash or tag using the revision field:

revision: "a1b2c3d"       # Pin to specific commit hash
# revision: "v1.0.0"      # Or use a git tag

Without revision, the latest version from the default branch is used.

Remote Catalog

The catalog fetches available models directly from this repository on GitHub. Team members add YAML manifests to revospeech/models/ and users discover them without upgrading.

# Browse all available models from the repo
revospeech catalog list

# Install a model locally
revospeech catalog pull revovoice

Documentation

Project Structure

revospeech/
├── revospeech/
│   ├── asr/           # ASR engines (sherpa-onnx)
│   ├── tts/           # TTS engines (RevoVoice)
│   ├── registry/      # Model manifests, registry, downloader, status
│   ├── cli/           # Click CLI
│   ├── config.py      # API key & configuration management
│   ├── exceptions.py  # Custom exception hierarchy
│   ├── catalog.py     # Remote model catalog (GitHub, cached)
│   ├── device.py      # GPU/CPU auto-detection
│   └── models/        # Bundled YAML manifests
├── tests/
├── pyproject.toml
├── AGENTS.md
└── CONTRIBUTING.md

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revospeech-0.1.1.tar.gz (215.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

revospeech-0.1.1-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file revospeech-0.1.1.tar.gz.

File metadata

  • Download URL: revospeech-0.1.1.tar.gz
  • Upload date:
  • Size: 215.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for revospeech-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d062a5cf9645c205669fb8d4fa9b881fca78e15d5491a6b90ae511f7f122c77c
MD5 8357125493b3a32cd607727558ac0fc7
BLAKE2b-256 24376aebeeb67446c812721ae4a0f2d48c6b6c67ff496a6a6108402f69c9f738

See more details on using hashes here.

File details

Details for the file revospeech-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: revospeech-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for revospeech-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0584e0fb6b9b6d10a058cbe06ed01ecc057dd90e8d233f298315c33cfea16594
MD5 53488a8a9209c3389447eddc192c8a21
BLAKE2b-256 f2e34ba7da25e110ab756de5821d011375064b97e1983d280f12117d3ddf4187

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page