Skip to main content

RAG over audio files with provider-agnostic pipeline

Project description

AudioRAG

Provider-agnostic RAG pipeline for audio content. Download, transcribe, chunk, embed, and search audio from YouTube and other sources.

Features

  • Multi-provider support: OpenAI, Deepgram, AssemblyAI, Groq (STT); OpenAI, Voyage, Cohere (embeddings); OpenAI, Anthropic, Gemini (generation); ChromaDB, Pinecone, Weaviate, Supabase (vector stores)
  • Resumable processing: SQLite state tracking with hash-based IDs
  • Automatic chunking: Time-based segmentation with configurable duration
  • Audio splitting: Handles large files by splitting before transcription
  • Structured logging: Context-aware logging with operation timing
  • Type-safe: Python 3.12+ with full type annotations

Quick Start

import asyncio
from audiorag import AudioRAGPipeline, AudioRAGConfig

async def main():
    # Configure with your chosen providers
    config = AudioRAGConfig(
        stt_provider="openai",
        stt_model="whisper-1",
        embedding_provider="openai",
        embedding_model="text-embedding-3-small",
        vector_store_provider="chromadb",
        generation_provider="openai",
        generation_model="gpt-4o-mini",
        # API keys can also be set via environment variables
        openai_api_key="sk-...",
    )
    
    # Initialize pipeline
    pipeline = AudioRAGPipeline(config)
    
    # Index audio from YouTube
    await pipeline.index("https://youtube.com/watch?v=...")
    
    # Query the indexed content
    result = await pipeline.query("What are the main points discussed?")
    print(result.answer)
    
    # Access sources with timestamps
    for source in result.sources:
        print(f"{source.video_title} at {source.start_time}s")
        print(f"URL: {source.youtube_timestamp_url}")

asyncio.run(main())

Installation

# Install with uv (recommended)
uv pip install audiorag

# Or with pip
pip install audiorag

Optional Dependencies

# Audio scraping utilities (yt-dlp, pydub)
uv pip install audiorag[defaults]  # or: pip install audiorag[defaults]

# All providers and utilities
uv pip install audiorag[all]  # or: pip install audiorag[all]

# Specific providers only
uv pip install audiorag[openai,chromadb,scraping,cohere]

Configuration

AudioRAG uses pydantic-settings with environment variable support. All settings use the AUDIORAG_ prefix.

# Example: Using OpenAI for STT, embeddings, and generation
export AUDIORAG_OPENAI_API_KEY="sk-..."
export AUDIORAG_STT_PROVIDER="openai"
export AUDIORAG_EMBEDDING_PROVIDER="openai"
export AUDIORAG_VECTOR_STORE_PROVIDER="chromadb"
export AUDIORAG_GENERATION_PROVIDER="openai"

# Example: Using different providers
export AUDIORAG_DEEPGRAM_API_KEY="..."
export AUDIORAG_STT_PROVIDER="deepgram"
export AUDIORAG_VOYAGE_API_KEY="..."
export AUDIORAG_EMBEDDING_PROVIDER="voyage"

# Processing settings
export AUDIORAG_CHUNK_DURATION_SECONDS="30"
export AUDIORAG_RETRIEVAL_TOP_K="10"
export AUDIORAG_RERANK_TOP_N="3"

See Configuration Guide for all options.

Documentation

Development

# Clone and setup
git clone <repository-url>
cd audiorag
uv sync

# Run tests
uv run pytest

# Run checks
uv run ruff check . --fix
uv run ty check

# Install pre-commit hooks
uv run prek install

Pipeline Stages

  1. Download: Fetch audio from URL (YouTube supported)
  2. Split: Divide large files into processable chunks
  3. Transcribe: Convert audio to text using STT provider
  4. Chunk: Group transcription into time-based segments
  5. Embed: Generate vector embeddings for each chunk
  6. Store: Persist embeddings in vector database

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiorag-0.1.0.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiorag-0.1.0-py3-none-any.whl (68.4 kB view details)

Uploaded Python 3

File details

Details for the file audiorag-0.1.0.tar.gz.

File metadata

  • Download URL: audiorag-0.1.0.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for audiorag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 30e12f666943e2bd3485d224ee7f166b8be4dc54907de72e3f93ab6fd518fc89
MD5 9b21f7a9a7e972e91d275426f09f8a57
BLAKE2b-256 39f490dc2345f6ce330cf5eb47eb93454e3bf0b4826a054ece62413c80030153

See more details on using hashes here.

File details

Details for the file audiorag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: audiorag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for audiorag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e17405fdede4751361fc8498debb2a92f83d1cbee7ec46be4559afd35174ef1b
MD5 6e779150d220f9adfb199f76ab57bdbf
BLAKE2b-256 e13d80ebef0b4dfd1ff33dc897347ea5063947cd139a786584a0f43056d144ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page