Skip to main content

Local-first TTS/STT, streaming voice output, and optional voice cloning for AI applications

Project description

AbstractVoice

PyPI version CI Tested Python license GitHub stars

Local-first voice I/O for AI applications: TTS, STT, microphone control, streaming speech output, and optional voice cloning behind a small Python API.

AbstractVoice is useful on its own, and it is also the voice capability package for the AbstractFramework ecosystem. It does not force you to run a daemon: embed VoiceManager directly when you want an in-process library; install it beside AbstractCore when you want OpenAI-compatible HTTP audio endpoints.

  • TTS (default): Piper (cross-platform, no system deps)
  • STT (default): faster-whisper
  • Local assistant: listen() + speak() with playback/listening control
  • Headless/server-friendly: speak_to_bytes(), speak_to_file(), transcribe_*
  • Streaming TTS: speak_to_audio_chunks() and open_tts_text_stream()
  • Voice cloning / heavier TTS (optional): OpenF5, Chroma, AudioDiT, OmniVoice
  • Local web example (optional): abstractvoice web
  • AbstractCore plugin: discovered through abstractcore.capabilities_plugins

Status: alpha (0.8.x). The default Piper/faster-whisper path is usable today; optional cloning and torch-based engines are heavier and should be validated on your target hardware. The supported integrator surface is documented in docs/api.md, and current engine caveats are tracked in docs/known-issues.md.

Next: docs/getting-started.md (recommended setup + first smoke tests).

Positioning: Library First, Server Through AbstractCore

AbstractVoice has three intended usage modes:

  1. Standalone Python library: call VoiceManager directly from a desktop app, local assistant, batch job, or your own backend.
  2. Local examples: use the REPL (abstractvoice) or the optional FastAPI web example (abstractvoice web) to validate VoiceManager from a browser.
  3. AbstractCore capability plugin: install it next to AbstractCore and let AbstractCore expose voice/audio capabilities to agents and OpenAI-compatible clients.
  4. AbstractFramework component: use it as the voice layer inside the wider AbstractFramework stack (https://github.com/lpalbou/abstractframework).

Key links:

  • AbstractCore (agents/capabilities): https://abstractcore.ai and https://github.com/lpalbou/abstractcore
  • AbstractFramework (umbrella): https://github.com/lpalbou/abstractframework

Integration points:

  • AbstractCore capability plugin entry point: pyproject.toml[project.entry-points."abstractcore.capabilities_plugins"]
    Implementation: abstractvoice/integrations/abstractcore_plugin.py
  • AbstractRuntime ArtifactStore adapter (optional, duck-typed): abstractvoice/artifacts.py

Important: AbstractVoice is a voice I/O library (TTS/STT + optional cloning), not an agent framework and not a standalone LLM server. That boundary is intentional: in the AbstractFramework stack, AbstractCore owns agents, provider routing, and OpenAI-compatible HTTP endpoints; AbstractVoice supplies the concrete voice implementation.

flowchart LR
  App["Your app / REPL"] --> VM["abstractvoice.VoiceManager"]
  VM --> TTS["Piper TTS"]
  VM --> STT["faster-whisper STT"]
  VM --> IO["sounddevice / PortAudio"]

  subgraph AbstractFramework
    AC["AbstractCore"] -. "capability plugin" .-> VM
    AR["AbstractRuntime"] -. "optional ArtifactStore" .-> VM
  end

The shipped AbstractCore integration is via the capability plugin above. The abstractvoice REPL is a demonstrator/smoke-test harness (see docs/repl_guide.md) and includes a minimal OpenAI-compatible LLM HTTP client (abstractvoice/examples/llm_provider.py) for convenience.

Use with AbstractCore

Install AbstractVoice into the same environment as AbstractCore:

pip install "abstractcore[server]" abstractvoice

AbstractCore discovers AbstractVoice through the abstractcore.capabilities_plugins entry point and can use it as:

  • core.voice.tts(...) / llm.voice.tts(...) for TTS
  • core.audio.transcribe(...) / llm.audio.transcribe(...) for STT
  • OpenAI-compatible server endpoints when AbstractCore Server is running:
    • POST /v1/audio/speech
    • POST /v1/audio/transcriptions

Minimal server smoke test:

python -m abstractcore.server.app

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello from AbstractVoice through AbstractCore.","format":"wav"}' \
  --output hello.wav

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@hello.wav" \
  -F "language=en"

For the current AbstractCore surface, see https://abstractcore.ai and https://github.com/lpalbou/abstractcore.

Use with AbstractFramework

If you’re using the full AbstractFramework stack, install and run via the umbrella project and gateway tooling. Start here: https://github.com/lpalbou/abstractframework.


Install

Requires Python >=3.9 (see pyproject.toml).

pip install abstractvoice

Optional extras (feature flags):

pip install "abstractvoice[all]"
pip install "abstractvoice[web]"   # local FastAPI web example

Notes:

  • abstractvoice[all] enables most optional features (incl. cloning + AEC + audio-fx), but does not include the GPU-heavy Chroma runtime, AudioDiT, or OmniVoice.
  • Python 3.9 supports the core stack, web UI, and AudioDiT TTS/prompt-audio cloning. OpenF5/F5-TTS, Chroma, and OmniVoice require Python 3.10+ because their upstream runtimes do; AEC requires Python 3.11+ because aec-audio-processing does.
  • For the full list of extras (and platform troubleshooting), see docs/installation.md.

Explicit model downloads (recommended; never implicit in the REPL)

Some features rely on large model weights/artifacts. AbstractVoice will not download these implicitly inside the REPL (offline-first).

After installing, prefetch explicitly (cross-platform).

Recommended (most users):

abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small

Optional (voice cloning artifacts):

pip install "abstractvoice[cloning]"
abstractvoice-prefetch --openf5

# Heavy (torch/transformers):
pip install "abstractvoice[audiodit]"
abstractvoice-prefetch --audiodit

pip install "abstractvoice[omnivoice]"
abstractvoice-prefetch --omnivoice

# GPU-heavy:
pip install "abstractvoice[chroma]"
abstractvoice-prefetch --chroma

Equivalent python -m form:

python -m abstractvoice download --piper en
python -m abstractvoice download --stt small
python -m abstractvoice download --openf5   # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma   # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]

Notes:

  • --piper <lang> downloads the Piper ONNX voice for that language into ~/.piper/models.
  • --openf5 is ~5.4GB. --chroma is very large (GPU-heavy).

Quick smoke tests

REPL (fastest end-to-end)

abstractvoice --verbose
# or (from a source checkout):
python -m abstractvoice cli --verbose

Notes:

  • Mic voice input is off by default for fast startup. Enable with --voice-mode stop (or in-session: /voice stop).
  • The REPL is offline-first: no implicit model downloads. Use the explicit download commands above.
  • REPL voice selection is centered on /voices; older commands such as /profile, /tts_voice, and /setvoice remain as compatibility/direct forms.
  • The REPL is primarily a demonstrator. For production agent/server use in the AbstractFramework ecosystem, run AbstractCore and use AbstractVoice via its capability plugin (see docs/api.md → “Integrations”).

See docs/repl_guide.md.

Local web example

pip install "abstractvoice[web]"
abstractvoice web --port 5000

Use pip install "abstractvoice[web-omnivoice]" for the browser UI plus OmniVoice, or pip install "abstractvoice[web-full]" for the browser UI plus the optional local voice/cloning engine dependencies.

Open http://127.0.0.1:5000. The browser example has message/conversation playback, chat clearing, assistant/user voice selectors, browser voice cloning from uploaded or recorded reference audio, text-to-WAV, file transcription, and a tiny optional LLM dialogue panel for OpenAI-compatible local providers such as Ollama or LM Studio. It exposes small local /api/* routes plus /v1/audio/* smoke-test aliases, but the supported production HTTP path remains AbstractCore Server.

The browser clone action validates the new voice by synthesizing a short sample before it reports success. If the selected optional engine cannot load, the unusable clone is removed and the UI shows the backend error.

Minimal Python

from abstractvoice import VoiceManager

vm = VoiceManager()
vm.speak("Hello! This is AbstractVoice.")

Public API (stable surface)

See docs/api.md for the supported integrator contract.

At a glance:

  • TTS: speak(), stop_speaking(), pause_speaking(), resume_speaking(), speak_to_bytes(), speak_to_file()
  • STT: transcribe_file(), transcribe_from_bytes()
  • Mic: listen(), stop_listening(), pause_listening(), resume_listening()

Documentation

  • Getting started: docs/getting-started.md
  • Public API: docs/api.md
  • Architecture: docs/architecture.md
  • FAQ: docs/faq.md
  • REPL guide: docs/repl_guide.md
  • Known issues: docs/known-issues.md
  • Docs index: docs/README.md
  • Install troubleshooting: docs/installation.md
  • Multilingual support: docs/multilingual.md
  • Design decisions: docs/adr/
  • Acronyms: docs/acronyms.md
  • Model management (Piper-first): docs/model-management.md
  • Licensing notes: docs/voices-and-licenses.md

Project

  • Changelog: CHANGELOG.md
  • Contributing: CONTRIBUTING.md
  • Known issues: docs/known-issues.md
  • Bug reports: .github/ISSUE_TEMPLATE/bug_report.yml
  • Security: SECURITY.md
  • Acknowledgments: ACKNOWLEDGMENTS.md

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abstractvoice-0.8.2.tar.gz (246.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abstractvoice-0.8.2-py3-none-any.whl (244.8 kB view details)

Uploaded Python 3

File details

Details for the file abstractvoice-0.8.2.tar.gz.

File metadata

  • Download URL: abstractvoice-0.8.2.tar.gz
  • Upload date:
  • Size: 246.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abstractvoice-0.8.2.tar.gz
Algorithm Hash digest
SHA256 aa76ef36059a8402b311c3bfeb150924f7c42dee64008eacf2eb7825f2facd22
MD5 1ff143f618a8c01f1ad5ffe9c01ae1ca
BLAKE2b-256 6f41194d217059d4f747f11c1086c02d11ff3832fce88b57cf4ee61a6450a84c

See more details on using hashes here.

Provenance

The following attestation bundles were made for abstractvoice-0.8.2.tar.gz:

Publisher: release.yml on lpalbou/AbstractVoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file abstractvoice-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: abstractvoice-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 244.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abstractvoice-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3c372b4a7cc3b55a730aa98ef637ad7fba94c151bf723f840a300fb1d267318e
MD5 089dfbcddb55e36fb698275b08b8b40b
BLAKE2b-256 0d0fb15d12641b40916b23b87d8b8fc650997631b677941207dd3e256bc4c4c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for abstractvoice-0.8.2-py3-none-any.whl:

Publisher: release.yml on lpalbou/AbstractVoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page