On-device voice processing pipeline for fast, private voice interaction
Project description
llama-voice
Llama Voice (llama-voice) is a toolkit for integrating voice interaction capabilities within the LlamaSearch AI ecosystem. It provides functionalities for processing voice input, potentially including speech-to-text (STT) and text-to-speech (TTS) using various models.
Key Features
- Voice Processing: Core components for handling voice data (
processor/). - Model Support: Designed to work with different voice models (
models/), allowing flexibility in STT/TTS engines. - Core Orchestration: A central module (
core.py) likely manages the voice processing flow. - Utilities: Includes helper functions for voice-related tasks (
utils/). - Configurable: Allows customization through configuration files (
config.py).
Installation
pip install llama-voice
# Or install directly from GitHub for the latest version:
# pip install git+https://github.com/llamasearchai/llama-voice.git
Usage
(Usage examples demonstrating voice input processing, STT, or TTS will be added here.)
# Placeholder for Python client usage
# from llama_voice import VoiceClient, VoiceConfig
# config = VoiceConfig.load("path/to/config.yaml")
# client = VoiceClient(config)
# # Example: Speech-to-Text
# text_result = client.transcribe(audio_file="path/to/audio.wav")
# print(f"Transcription: {text_result}")
# # Example: Text-to-Speech
# audio_output = client.synthesize("Hello from Llama Voice!")
# with open("output.wav", "wb") as f:
# f.write(audio_output)
Architecture Overview
graph TD
A[Audio Input / Text Input] --> B{Core Orchestrator (core.py)};
B -- STT Request --> C{Voice Processor (processor/)};
C -- Uses Model --> D[STT Model (models/)];
C --> E[Text Output];
B -- TTS Request --> F{Voice Processor (processor/)};
F -- Uses Model --> G[TTS Model (models/)];
F --> H[Audio Output];
I[Configuration (config.py)] -- Configures --> B;
I -- Configures --> C;
I -- Configures --> F;
J[Utilities (utils/)] -- Used by --> C;
J -- Used by --> F;
style B fill:#f9f,stroke:#333,stroke-width:2px
- Input: Receives either audio for transcription or text for synthesis.
- Core Orchestrator: Manages the request and directs it to the processor.
- Voice Processor: Handles the specific STT or TTS task, interacting with the selected model.
- Models: Contains implementations or interfaces for different STT/TTS engines.
- Output: Produces either transcribed text or synthesized audio.
- Config/Utils: Configuration (
config.py) controls behavior; Utilities (utils/) provide support functions.
Configuration
(Details on configuring STT/TTS models, language settings, audio formats, etc., will be added here.)
Development
Setup
# Clone the repository
git clone https://github.com/llamasearchai/llama-voice.git
cd llama-voice
# Install in editable mode with development dependencies
pip install -e ".[dev]"
Testing
pytest tests/
Contributing
Contributions are welcome! Please refer to CONTRIBUTING.md and submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_voice-0.1.0.tar.gz.
File metadata
- Download URL: llama_voice-0.1.0.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa24a6d4bb189ab3fdcdffe0b265dcbd9615beff9ae78fcfe0153acbe8978710
|
|
| MD5 |
011f3c84b2a3e52fbf8dc12b551981ac
|
|
| BLAKE2b-256 |
41c28b4249188129e25b2601a721102493645bc8c3d8179c6dbff6add364eb4d
|
File details
Details for the file llama_voice-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_voice-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b77c04cbed469dcb366138ff5aef7e1f586c882e11955db94b8da07357c5c1d9
|
|
| MD5 |
4d877ba3682f195b9914fb1cbd79b813
|
|
| BLAKE2b-256 |
41de1ba810d1385723f072e4beba237a35cb3077023bbb439113f60ca90693a5
|