Skip to main content

Text-to-Speech with multiple backends for scientific workflows

Project description

SciTeX Audio (scitex-audio)

SciTeX

Text-to-Speech with multiple backends for scientific workflows

PyPI version Documentation Tests License: AGPL-3.0

Full Documentation · pip install scitex-audio


Problem

Scientific workflows increasingly use AI agents that run on remote servers or headless environments. These agents need to communicate results audibly — for experiment completion notifications, error alerts, or accessibility — but have no direct access to audio hardware.

Solution

SciTeX Audio provides a unified TTS interface with automatic backend fallback and smart local/remote routing. It works on local machines, remote servers (via relay), and WSL environments with automatic audio path detection.

Backend Quality Cost Internet Offline Default Speed
ElevenLabs High Paid Required No 1.2x
LuxTTS High Free First download Yes 2.0x
Google TTS Good Free Required No 1.5x
System TTS Basic Free No Yes 150 wpm

Table 1. Supported TTS backends. The fallback order (elevenlabs → luxtts → gtts → pyttsx3) ensures the best available quality is always used.

Installation

Requires Python >= 3.10.

pip install scitex-audio

Install with specific backends:

pip install scitex-audio[gtts]         # Google TTS
pip install scitex-audio[pyttsx3]      # System TTS (+ apt install espeak-ng)
pip install scitex-audio[elevenlabs]   # ElevenLabs
pip install scitex-audio[luxtts]       # LuxTTS (voice cloning, offline)
pip install scitex-audio[all]          # Everything

Quick Start

from scitex_audio import speak, available_backends

# Check what's available
print(available_backends())  # e.g., ['gtts', 'pyttsx3']

# Speak with auto-selected backend
speak("Hello from SciTeX Audio!")

# Choose a specific backend
speak("Bonjour", backend="gtts", voice="fr")

# Save without playing
speak("Save this", output_path="output.mp3", play=False)

Three Interfaces

Python API
import scitex_audio

scitex_audio.speak("Hello!")                         # auto backend
scitex_audio.speak("Fast", backend="gtts", speed=1.5)
scitex_audio.available_backends()                    # list backends
scitex_audio.check_wsl_audio()                       # WSL audio status
scitex_audio.generate_bytes("As bytes")              # raw MP3 bytes
scitex_audio.stop_speech()                           # kill playback

tts = scitex_audio.get_tts("gtts")                   # get engine
tts.speak("With engine", voice="fr")

Full API reference

CLI Commands
scitex-audio --help-recursive             # Show all commands
scitex-audio speak "Hello world"          # Speak text
scitex-audio speak "Bonjour" -b gtts -v fr
scitex-audio backends                     # List backends
scitex-audio check                        # Audio status (WSL)
scitex-audio stop                         # Stop playback
scitex-audio relay --port 31293           # Start relay server
scitex-audio list-python-apis             # List Python API tree
scitex-audio mcp list-tools               # List MCP tools

Full CLI reference

MCP Server — for AI Agents

AI agents can speak through the MCP protocol for notifications and accessibility.

Tool Description
audio_speak Convert text to speech with backend fallback
list_backends List available TTS backends and status
check_audio_status Check WSL audio connectivity
announce_context Announce current directory and git branch

Table 2. Four MCP tools available for AI-assisted audio. All tools accept JSON parameters and return JSON results.

scitex-audio mcp start

Full MCP specification

Remote Audio Relay

When agents run on remote servers (NAS, cloud, HPC), they have no speakers. The relay server solves this: the local machine (with speakers) runs a lightweight HTTP server, and the remote agent sends speech requests over an SSH tunnel.

Architecture

Remote Server (NAS/Cloud)          Local Machine (has speakers)
┌─────────────────────┐            ┌──────────────────────┐
│ AI Agent            │            │ Relay Server         │
│   speak("Hello")    │            │   scitex-audio relay │
│     ↓               │            │     ↓                │
│ POST /speak ────────┼── SSH ─────┼→ TTS engine          │
│   (port 31293)      │  tunnel    │     ↓                │
│                     │            │   🔊 Speakers        │
└─────────────────────┘            └──────────────────────┘

Setup

Step 1: Start relay on local machine (has speakers)

scitex-audio relay --port 31293

Step 2: SSH tunnel from local to remote

Add to your ~/.ssh/config:

Host my-server
  HostName 192.168.0.69
  User myuser
  RemoteForward 31293 127.0.0.1:31293   # Audio relay

Then connect: ssh my-server. The tunnel maps remote port 31293 back to your local relay.

Step 3: Configure remote environment

On the remote server, set:

export SCITEX_AUDIO_MODE=remote
export SCITEX_AUDIO_RELAY_PORT=31293
# URL is auto-detected from the tunnel (localhost:31293)

Now speak("Hello") on the remote server plays audio on your local speakers.

Relay Endpoints

Endpoint Method Description
/speak POST Play TTS ({"text": "...", "backend": "gtts"})
/health GET Health check (returns {"status": "ok"})
/list_backends GET List available backends

Auto-Start Relay (Shell Profile)

# ~/.bashrc or ~/.bash_profile (local machine with speakers)
_start_audio_relay() {
    local port="${SCITEX_AUDIO_RELAY_PORT:-31293}"
    # Skip if already running
    curl -sf "http://localhost:$port/health" >/dev/null 2>&1 && return
    # Start in background
    scitex-audio relay --port "$port" --force &>/dev/null &
    disown
}
_start_audio_relay

Environment Variables

Variable Default Description
SCITEX_AUDIO_MODE auto Audio mode: local, remote, or auto
SCITEX_AUDIO_RELAY_URL (auto) Full relay URL (e.g., http://localhost:31293)
SCITEX_AUDIO_RELAY_HOST (none) Relay host (combined with port to build URL)
SCITEX_AUDIO_RELAY_PORT 31293 Relay server port
SCITEX_AUDIO_HOST 0.0.0.0 Relay server bind host
SCITEX_AUDIO_ELEVENLABS_API_KEY (none) ElevenLabs API key
SCITEX_DIR ~/.scitex Base directory for audio cache files
SCITEX_CLOUD (none) Set to true for browser relay mode (OSC escape)

Table 3. Environment variables. Port 31293 encodes "sa-i-te-ku-su" (サイテクス) in Japanese phone keypad mapping.

Mode Resolution

  • auto (default): Checks local audio availability. If suspended/unavailable and relay URL detected, routes to relay. Otherwise uses local.
  • local: Always use local TTS engine and speakers.
  • remote: Always send speech to relay server. Fails if relay unreachable.

Part of SciTeX

SciTeX Audio is part of SciTeX. When used inside the orchestrator package scitex, audio integrates with the session system for automatic experiment notifications:

import scitex

@scitex.session
def main(CONFIG=scitex.INJECTED):
    data = scitex.io.load("input.csv")
    result = process(data)
    scitex.io.save(result, "output.csv")
    scitex.audio.speak("Experiment complete")  # notify via TTS
    return 0

The SciTeX ecosystem follows the Four Freedoms for researchers:

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_audio-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_audio-0.2.0-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file scitex_audio-0.2.0.tar.gz.

File metadata

  • Download URL: scitex_audio-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for scitex_audio-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5cc51513cf94fd7c9eaac5b13e2182da0daf037fbd324b4ff515cc52bbdb133d
MD5 ef85647175bca71c2d3dc7421044b0a1
BLAKE2b-256 497e42c0faee32410f4616a9748e13bb97cfc082459b73aba7b3e6635c4c2ecd

See more details on using hashes here.

File details

Details for the file scitex_audio-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: scitex_audio-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for scitex_audio-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b4a7c7d3470233d694dd3fa971ab3e0571326f42ce58db3529dc8ae27c61156
MD5 23ff6b09ab41d0a4c73d11a3d125a6fa
BLAKE2b-256 1ed2e69e9e6291fd952f1029b6748754b98fd000a1be7c6d1d6643dfbbff0c23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page