Remote-compatible and local TTS/STT, streaming voice output, and optional voice cloning for AI applications
Project description
AbstractVoice
Lightweight voice I/O for AI applications: remote/OpenAI-compatible audio adapters by default, plus local TTS, STT, microphone control, streaming speech output, and optional voice cloning behind explicit extras.
AbstractVoice is useful on its own, and it is also the voice capability package
for the AbstractFramework ecosystem. It does not force you to run a daemon:
embed VoiceManager directly when you want an in-process library; install it
beside AbstractCore when you want OpenAI-compatible HTTP audio endpoints.
- Remote audio (base install): OpenAI/OpenAI-compatible TTS, STT, profile listing, and compatible clone endpoints
- Platform local stacks (
abstractvoice[apple],abstractvoice[gpu]): Piper, Supertonic 3, faster-whisper, microphone/playback, AEC, and local cloning/TTS engines - Hardware profile aliases:
abstractvoice[apple]andabstractvoice[gpu]install the local stack;abstractvoice[all-apple]andabstractvoice[all-gpu]add the lightweight web example dependencies. - Granular local extras:
abstractvoice[piper],abstractvoice[supertonic],abstractvoice[stt],abstractvoice[audio-io],abstractvoice[cloning],abstractvoice[audiodit],abstractvoice[omnivoice],abstractvoice[chroma] - Headless/server-friendly:
speak_to_bytes(),speak_to_file(),transcribe_* - Streaming TTS:
speak_to_audio_chunks()andopen_tts_text_stream() - Voice cloning / heavier TTS (optional): OmniVoice is the recommended/default local cloning backend; OpenF5, Chroma, and AudioDiT remain explicit alternatives. Supertonic is fixed-profile TTS, not cloning.
- Local web example (optional):
abstractvoice web - AbstractCore plugin: discovered through
abstractcore.capabilities_plugins
Status: alpha (0.10.x). The base install and library constructor are
remote-first: VoiceManager() and library auto select hosted OpenAI audio and require
OPENAI_API_KEY (or remote_api_key=...). Local/offline stacks are available
through abstractvoice[apple] / abstractvoice[gpu] or granular engine
composition such as abstractvoice[supertonic,stt,audio-io].
The shipped CLI and web examples use an install-aware auto resolver instead:
installed Supertonic first, installed Piper second, then OpenAI remote as a
fallback. This keeps plain abstractvoice remote/OpenAI by default, while
abstractvoice[all-apple] and abstractvoice[all-gpu] start on Supertonic.
Use an explicit local engine such as --tts-engine supertonic when you require
no remote TTS.
For new local voice clones, the default cloning backend is OmniVoice; install
abstractvoice[omnivoice] or a platform/full profile before using clone
commands without --engine. Optional cloning and torch-based engines are
heavier and should be validated on your target hardware. The supported integrator surface is documented in
docs/api.md, and current engine caveats are tracked in
docs/known-issues.md.
Next: docs/getting-started.md (recommended setup + first smoke tests).
Published documentation: https://www.lpalbou.info/AbstractVoice/.
Positioning: Library First, Server Through AbstractCore
AbstractVoice has four intended usage modes:
- Standalone Python library: call
VoiceManagerdirectly from a desktop app, local assistant, batch job, or your own backend. - Local examples: use the REPL (
abstractvoice) or the optional FastAPI web example (abstractvoice web) to validateVoiceManagerfrom a browser. - AbstractCore capability plugin: install it next to AbstractCore and let AbstractCore expose voice/audio capabilities to agents and OpenAI-compatible clients.
- AbstractFramework component: use it as the voice layer inside the wider
AbstractFramework stack (
https://github.com/lpalbou/abstractframework).
Key links:
- AbstractCore (agents/capabilities):
https://abstractcore.aiandhttps://github.com/lpalbou/abstractcore - AbstractFramework (umbrella):
https://github.com/lpalbou/abstractframework
Integration points:
- AbstractCore capability plugin entry point:
pyproject.toml→[project.entry-points."abstractcore.capabilities_plugins"]
Implementation:abstractvoice/integrations/abstractcore_plugin.py - AbstractRuntime ArtifactStore adapter (optional, duck-typed):
abstractvoice/artifacts.py
Important: AbstractVoice is a voice I/O library (TTS/STT + optional cloning), not an agent framework and not a standalone LLM server. That boundary is intentional: in the AbstractFramework stack, AbstractCore owns agents, provider routing, and OpenAI-compatible HTTP endpoints; AbstractVoice supplies the concrete voice implementation.
flowchart LR
App["Your app / REPL"] --> VM["abstractvoice.VoiceManager"]
VM --> Remote["OpenAI-compatible audio"]
VM --> TTS["Piper TTS (local extra)"]
VM --> Supertonic["Supertonic TTS (local extra)"]
VM --> STT["faster-whisper STT (local extra)"]
VM --> IO["sounddevice / PortAudio (local extra)"]
subgraph AbstractFramework
AC["AbstractCore"] -. "capability plugin" .-> VM
AR["AbstractRuntime"] -. "optional ArtifactStore" .-> VM
end
The shipped AbstractCore integration is via the capability plugin above. The abstractvoice REPL is a demonstrator/smoke-test harness (see docs/repl_guide.md) and includes a minimal OpenAI-compatible LLM HTTP client (abstractvoice/examples/llm_provider.py) for convenience.
Use with AbstractCore
Install AbstractVoice into the same environment as AbstractCore:
pip install "abstractcore[server]" abstractvoice
AbstractCore discovers AbstractVoice through the
abstractcore.capabilities_plugins entry point and can use it as:
core.voice.tts(...)/llm.voice.tts(...)for TTS- voice catalog discovery through the backend methods
list_profiles(...),list_tts_models(),list_stt_models(), andvoice_catalog() core.audio.transcribe(...)/llm.audio.transcribe(...)for STT- OpenAI-compatible server endpoints when AbstractCore Server is running:
POST /v1/audio/speechPOST /v1/audio/transcriptionsGET /v1/audio/voices,/v1/audio/speech/models, and/v1/audio/transcriptions/modelsfor UI catalog discovery
For a remote-first Gateway/Core deployment, the AbstractCore plugin defaults to
OpenAI remote TTS/STT and reads OPENAI_API_KEY. Configure
voice_tts_engine=openai-compatible, voice_stt_engine=openai-compatible, and
voice_remote_base_url=... for a compatible audio endpoint. For local
Supertonic/Piper/faster-whisper inside the same environment, install
abstractvoice[apple] or abstractvoice[gpu], or compose granular extras such
as abstractvoice[supertonic,stt], then select the local engines explicitly.
Do not point voice_remote_base_url back at the same AbstractCore Server
instance that is resolving the plugin fallback; that loops through
/v1/audio/* recursively. Use an upstream provider/gateway URL, or install the
local extra and select local engines.
Minimal server smoke test:
OPENAI_API_KEY=... python -m abstractcore.server.app
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input":"Hello from AbstractVoice through AbstractCore.","format":"wav"}' \
--output hello.wav
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@hello.wav" \
-F "language=en"
For the current AbstractCore surface, see https://abstractcore.ai and
https://github.com/lpalbou/abstractcore.
Use with AbstractFramework
If you’re using the full AbstractFramework stack, install and run via the umbrella project and gateway tooling. Start here: https://github.com/lpalbou/abstractframework.
Install
Requires Python >=3.9 (see pyproject.toml).
pip install abstractvoice
This is the lightweight remote/plugin base. It uses OpenAI audio by default:
export OPENAI_API_KEY=...
For local desktop/REPL voice and local cloning engines, use the platform profile for your machine:
pip install "abstractvoice[apple]"
pip install "abstractvoice[gpu]"
Common extras:
pip install "abstractvoice[openai]" # hosted OpenAI intent extra (no extra deps today)
pip install "abstractvoice[openai-compatible]" # generic compatible provider intent extra
pip install "abstractvoice[web]" # local FastAPI web example
pip install "abstractvoice[piper]" # local Piper TTS only
pip install "abstractvoice[supertonic]" # local Supertonic 3 ONNX TTS only
pip install "abstractvoice[stt]" # local faster-whisper STT only
pip install "abstractvoice[omnivoice]" # recommended/default local cloning engine
pip install "abstractvoice[cloning]" # explicit OpenF5 cloning engine
Notes:
abstractvoice[apple]andabstractvoice[gpu]are platform install profiles for the local voice stack: Piper, Supertonic 3, faster-whisper, audio I/O, AEC where supported, and local cloning/TTS engines gated by Python-version markers.abstractvoice[all-apple]andabstractvoice[all-gpu]install the full platform stack plus the web example dependencies.- Local base TTS should prefer Supertonic (
--tts-engine supertonic), while local cloning defaults to OmniVoice (--cloning-engine omnivoice). - Python 3.9 supports the lightweight base, web UI, local Piper/Supertonic/faster-whisper, and AudioDiT TTS/prompt-audio cloning. OpenF5/F5-TTS, Chroma, and OmniVoice require Python 3.10+ because their upstream runtimes do; AEC requires Python 3.11+ because
aec-audio-processingdoes. - For the full list of extras (and platform troubleshooting), see
docs/installation.md.
Explicit model downloads (recommended; never implicit in the REPL)
Some features rely on large model weights/artifacts. AbstractVoice will not download these implicitly inside the REPL (offline-first).
After installing, prefetch explicitly (cross-platform).
Recommended (most users):
abstractvoice-prefetch --supertonic
abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small
Optional (voice cloning artifacts):
pip install "abstractvoice[cloning]"
abstractvoice-prefetch --openf5
# Heavy (torch/transformers):
pip install "abstractvoice[audiodit]"
abstractvoice-prefetch --audiodit
pip install "abstractvoice[omnivoice]"
abstractvoice-prefetch --omnivoice
# GPU-heavy:
pip install "abstractvoice[chroma]"
abstractvoice-prefetch --chroma
OmniVoice is the default local cloning backend for new clones. Supertonic is not a cloning engine; use it for fixed-profile base TTS.
Equivalent python -m form:
python -m abstractvoice download --supertonic
python -m abstractvoice download --piper en
python -m abstractvoice download --stt small
python -m abstractvoice download --openf5 # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]
Notes:
--piper <lang>downloads the Piper ONNX voice for that language into~/.piper/models.--supertonicdownloads Supertonic 3 ONNX weights and built-in voice styles into~/.cache/abstractvoice/supertonic-3.--openf5is ~5.4GB.--chromais very large (GPU-heavy).
Quick smoke tests
REPL (fastest end-to-end)
abstractvoice --verbose
# or (from a source checkout):
python -m abstractvoice cli --verbose
# Force hosted OpenAI audio:
OPENAI_API_KEY=... abstractvoice --tts-engine openai --stt-engine openai --verbose
Notes:
- Mic voice input is off by default for fast startup. Enable with
--voice-mode stop(or in-session:/voice stop). - The REPL is offline-first: no implicit model downloads. Use the explicit download commands above.
- Interactive
autoprefers installed Supertonic, then installed Piper, then OpenAI remote. A plain lightweight install therefore starts on OpenAI, whileabstractvoice[all-apple]/abstractvoice[all-gpu]start on Supertonic. If the status line saysopenai (remote),/speakis making a remote TTS request. - For guaranteed local REPL TTS, install
abstractvoice[supertonic]or a platform profile, prefetch explicitly, and start withabstractvoice --tts-engine supertonic --stt-engine faster_whisper. - Local providers never download during REPL synthesis; missing artifacts fail with a prefetch hint instead of silently falling back to remote TTS.
- REPL voice selection is centered on
/voices; older commands such as/profile,/tts_voice, and/setvoiceremain as compatibility/direct forms. - Switching base TTS with
/tts engine ...resets the base voice/profile to the default for that engine and language; for example Supertonic starts onM1. - The REPL is primarily a demonstrator. For production agent/server use in the AbstractFramework ecosystem, run AbstractCore and use AbstractVoice via its capability plugin (see
docs/api.md→ “Integrations”).
See docs/repl_guide.md.
Local web example
pip install "abstractvoice[web]"
abstractvoice web --port 5000
# Hosted OpenAI audio in the same web UI
OPENAI_API_KEY=... abstractvoice web --tts-engine openai --stt-engine openai
# Guaranteed local web TTS after explicit prefetch
abstractvoice web --tts-engine supertonic --stt-engine faster_whisper
# OpenAI-compatible remote audio
abstractvoice web --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1
Use pip install "abstractvoice[web,supertonic]" for the browser UI plus
Supertonic, pip install "abstractvoice[web,omnivoice]" for OmniVoice, or
pip install "abstractvoice[all-apple]" / abstractvoice[all-gpu] for the
browser UI plus the platform local stack.
Open http://127.0.0.1:5000. The browser example has message/conversation
playback, chat clearing, assistant/user voice selectors, browser voice cloning
from uploaded or recorded reference audio, text-to-WAV, file transcription, and
a tiny optional LLM dialogue panel for OpenAI-compatible local providers such as
Ollama or LM Studio. It exposes small local /api/* routes plus /v1/audio/*
smoke-test aliases. The /v1/audio/voices and /v1/voice/clone extension
routes let another AbstractVoice client discover profiles/cloned voices and
request compatible remote cloning. The supported production HTTP path remains
AbstractCore Server. Treat the browser example as a local/dev surface: it does
not inherit AbstractCore/Gateway bearer-token or browser-origin policy.
The browser clone action validates the new voice by synthesizing a short sample before it reports success. If the selected optional engine cannot load, the unusable clone is removed and the UI shows the backend error.
Local Python
from abstractvoice import VoiceManager
vm = VoiceManager()
vm.speak("Hello! This is AbstractVoice.")
VoiceManager() is remote-first and reads OPENAI_API_KEY from the
environment. For offline/local inference:
from abstractvoice import VoiceManager
vm = VoiceManager(
tts_engine="supertonic",
stt_engine="faster_whisper",
cloning_engine="omnivoice",
)
vm.speak("Hello from the local stack.")
Install local support first with pip install "abstractvoice[supertonic]"
and pip install "abstractvoice[omnivoice]", or a platform profile such as
abstractvoice[apple] / abstractvoice[gpu].
Public API (stable surface)
See docs/api.md for the supported integrator contract.
At a glance:
- TTS:
speak(),set_tts_engine(),stop_speaking(),pause_speaking(),resume_speaking(),speak_to_bytes(),speak_to_file() - STT:
transcribe_file(),transcribe_from_bytes() - Mic:
listen(),stop_listening(),pause_listening(),resume_listening()
Documentation
- Published site: https://www.lpalbou.info/AbstractVoice/
- Getting started:
docs/getting-started.md - Public API:
docs/api.md - Architecture:
docs/architecture.md - FAQ:
docs/faq.md - REPL guide:
docs/repl_guide.md - Known issues:
docs/known-issues.md - Docs index:
docs/README.md - Install troubleshooting:
docs/installation.md - Multilingual support:
docs/multilingual.md - Design decisions:
docs/adr/ - Acronyms:
docs/acronyms.md - Model management:
docs/model-management.md - Licensing notes:
docs/voices-and-licenses.md
Project
- Changelog:
CHANGELOG.md - Contributing:
CONTRIBUTING.md - Known issues:
docs/known-issues.md - Bug reports:
.github/ISSUE_TEMPLATE/bug_report.yml - Security:
SECURITY.md - Acknowledgments:
ACKNOWLEDGMENTS.md
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abstractvoice-0.10.1.tar.gz.
File metadata
- Download URL: abstractvoice-0.10.1.tar.gz
- Upload date:
- Size: 300.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8650b62ab551acaf49073d05b8d13810901058399877e1effb2b1a85631613b9
|
|
| MD5 |
d884857ee0a37bcf69ca5f26d05d1ebb
|
|
| BLAKE2b-256 |
9afeacee013f2a06d52e450029f8c5c237cb1de052086d1f62f2b336cd626d35
|
Provenance
The following attestation bundles were made for abstractvoice-0.10.1.tar.gz:
Publisher:
release.yml on lpalbou/AbstractVoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abstractvoice-0.10.1.tar.gz -
Subject digest:
8650b62ab551acaf49073d05b8d13810901058399877e1effb2b1a85631613b9 - Sigstore transparency entry: 1549828648
- Sigstore integration time:
-
Permalink:
lpalbou/AbstractVoice@5e8e56191e1b9fc10a852e0cb9ccd89a2f53a42c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lpalbou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e8e56191e1b9fc10a852e0cb9ccd89a2f53a42c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file abstractvoice-0.10.1-py3-none-any.whl.
File metadata
- Download URL: abstractvoice-0.10.1-py3-none-any.whl
- Upload date:
- Size: 290.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27479ec20f2439b68e6d357e3427bbeb78ebdfc32ab1fdab43b7b8d4e1099482
|
|
| MD5 |
bab2cde31b130d70a8cf3dcb60245e19
|
|
| BLAKE2b-256 |
0db1ef93a6bdc34dff6baaa3615261a4c5ca88b963ad9bc8f7b71e29f160a8d5
|
Provenance
The following attestation bundles were made for abstractvoice-0.10.1-py3-none-any.whl:
Publisher:
release.yml on lpalbou/AbstractVoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
abstractvoice-0.10.1-py3-none-any.whl -
Subject digest:
27479ec20f2439b68e6d357e3427bbeb78ebdfc32ab1fdab43b7b8d4e1099482 - Sigstore transparency entry: 1549828724
- Sigstore integration time:
-
Permalink:
lpalbou/AbstractVoice@5e8e56191e1b9fc10a852e0cb9ccd89a2f53a42c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lpalbou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e8e56191e1b9fc10a852e0cb9ccd89a2f53a42c -
Trigger Event:
workflow_dispatch
-
Statement type: