Skip to main content

On-device voice processing pipeline for fast, private voice interaction

Project description

llama-voice

PyPI version License Python Version CI Status

Llama Voice (llama-voice) is a toolkit for integrating voice interaction capabilities within the LlamaSearch AI ecosystem. It provides functionalities for processing voice input, potentially including speech-to-text (STT) and text-to-speech (TTS) using various models.

Key Features

  • Voice Processing: Core components for handling voice data (processor/).
  • Model Support: Designed to work with different voice models (models/), allowing flexibility in STT/TTS engines.
  • Core Orchestration: A central module (core.py) likely manages the voice processing flow.
  • Utilities: Includes helper functions for voice-related tasks (utils/).
  • Configurable: Allows customization through configuration files (config.py).

Installation

pip install llama-voice
# Or install directly from GitHub for the latest version:
# pip install git+https://github.com/llamasearchai/llama-voice.git

Usage

(Usage examples demonstrating voice input processing, STT, or TTS will be added here.)

# Placeholder for Python client usage
# from llama_voice import VoiceClient, VoiceConfig

# config = VoiceConfig.load("path/to/config.yaml")
# client = VoiceClient(config)

# # Example: Speech-to-Text
# text_result = client.transcribe(audio_file="path/to/audio.wav")
# print(f"Transcription: {text_result}")

# # Example: Text-to-Speech
# audio_output = client.synthesize("Hello from Llama Voice!")
# with open("output.wav", "wb") as f:
#     f.write(audio_output)

Architecture Overview

graph TD
    A[Audio Input / Text Input] --> B{Core Orchestrator (core.py)};
    B -- STT Request --> C{Voice Processor (processor/)};
    C -- Uses Model --> D[STT Model (models/)];
    C --> E[Text Output];

    B -- TTS Request --> F{Voice Processor (processor/)};
    F -- Uses Model --> G[TTS Model (models/)];
    F --> H[Audio Output];

    I[Configuration (config.py)] -- Configures --> B;
    I -- Configures --> C;
    I -- Configures --> F;
    J[Utilities (utils/)] -- Used by --> C;
    J -- Used by --> F;

    style B fill:#f9f,stroke:#333,stroke-width:2px
  1. Input: Receives either audio for transcription or text for synthesis.
  2. Core Orchestrator: Manages the request and directs it to the processor.
  3. Voice Processor: Handles the specific STT or TTS task, interacting with the selected model.
  4. Models: Contains implementations or interfaces for different STT/TTS engines.
  5. Output: Produces either transcribed text or synthesized audio.
  6. Config/Utils: Configuration (config.py) controls behavior; Utilities (utils/) provide support functions.

Configuration

(Details on configuring STT/TTS models, language settings, audio formats, etc., will be added here.)

Development

Setup

# Clone the repository
git clone https://github.com/llamasearchai/llama-voice.git
cd llama-voice

# Install in editable mode with development dependencies
pip install -e ".[dev]"

Testing

pytest tests/

Contributing

Contributions are welcome! Please refer to CONTRIBUTING.md and submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_voice-0.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_voice-0.1.0-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file llama_voice-0.1.0.tar.gz.

File metadata

  • Download URL: llama_voice-0.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llama_voice-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa24a6d4bb189ab3fdcdffe0b265dcbd9615beff9ae78fcfe0153acbe8978710
MD5 011f3c84b2a3e52fbf8dc12b551981ac
BLAKE2b-256 41c28b4249188129e25b2601a721102493645bc8c3d8179c6dbff6add364eb4d

See more details on using hashes here.

File details

Details for the file llama_voice-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llama_voice-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llama_voice-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b77c04cbed469dcb366138ff5aef7e1f586c882e11955db94b8da07357c5c1d9
MD5 4d877ba3682f195b9914fb1cbd79b813
BLAKE2b-256 41de1ba810d1385723f072e4beba237a35cb3077023bbb439113f60ca90693a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page