Skip to main content

Agentic Retrieval Augmented Generation (RAG) with LanceDB

Project description

Haiku RAG

mcp-name: io.github.ggozad/haiku-rag

Retrieval-Augmented Generation (RAG) library built on LanceDB.

haiku.rag is a Retrieval-Augmented Generation (RAG) library built to work with LanceDB as a local vector database. It uses LanceDB for storing embeddings and performs semantic (vector) search as well as full-text search combined through native hybrid search with Reciprocal Rank Fusion. Both open-source (Ollama) as well as commercial (OpenAI, VoyageAI) embedding providers are supported.

Note: Configuration now uses YAML files instead of environment variables. If you're upgrading from an older version, run haiku-rag init-config --from-env to migrate your .env file to haiku.rag.yaml. See Configuration for details.

Features

  • Local LanceDB: No external servers required, supports also LanceDB cloud storage, S3, Google Cloud & Azure
  • Multiple embedding providers: Ollama, VoyageAI, OpenAI, vLLM
  • Multiple QA providers: Any provider/model supported by Pydantic AI
  • Research graph (multi‑agent): Plan → Search → Evaluate → Synthesize with agentic AI
  • Native hybrid search: Vector + full-text search with native LanceDB RRF reranking
  • Reranking: Default search result reranking with MixedBread AI, Cohere, or vLLM
  • Question answering: Built-in QA agents on your documents
  • File monitoring: Auto-index files when run as server
  • 40+ file formats: PDF, DOCX, HTML, Markdown, code files, URLs
  • MCP server: Expose as tools for AI assistants
  • A2A agent: Conversational agent with context and multi-turn dialogue
  • CLI & Python API: Use from command line or Python

Quick Start

# Install
# Python 3.12 or newer required
uv pip install haiku.rag

# Add documents
haiku-rag add "Your content here"
haiku-rag add "Your content here" --meta author=alice --meta topic=notes
haiku-rag add-src document.pdf --meta source=manual

# Search
haiku-rag search "query"

# Ask questions
haiku-rag ask "Who is the author of haiku.rag?"

# Ask questions with citations
haiku-rag ask "Who is the author of haiku.rag?" --cite

# Deep QA (multi-agent question decomposition)
haiku-rag ask "Who is the author of haiku.rag?" --deep --cite

# Deep QA with verbose output
haiku-rag ask "Who is the author of haiku.rag?" --deep --verbose

# Multi‑agent research (iterative plan/search/evaluate)
haiku-rag research \
  "What are the main drivers and trends of global temperature anomalies since 1990?" \
  --max-iterations 2 \
  --confidence-threshold 0.8 \
  --max-concurrency 3 \
  --verbose

# Rebuild database (re-chunk and re-embed all documents)
haiku-rag rebuild

# Start server with file monitoring
haiku-rag serve --monitor

To customize settings, create a haiku.rag.yaml config file (see Configuration).

Python Usage

from haiku.rag.client import HaikuRAG
from haiku.rag.research import (
    PlanNode,
    ResearchContext,
    ResearchDeps,
    ResearchState,
    build_research_graph,
    stream_research_graph,
)

async with HaikuRAG("database.lancedb") as client:
    # Add document
    doc = await client.create_document("Your content")

    # Search (reranking enabled by default)
    results = await client.search("query")
    for chunk, score in results:
        print(f"{score:.3f}: {chunk.content}")

    # Ask questions
    answer = await client.ask("Who is the author of haiku.rag?")
    print(answer)

    # Ask questions with citations
    answer = await client.ask("Who is the author of haiku.rag?", cite=True)
    print(answer)

    # Multi‑agent research pipeline (Plan → Search → Evaluate → Synthesize)
    graph = build_research_graph()
    question = (
        "What are the main drivers and trends of global temperature "
        "anomalies since 1990?"
    )
    state = ResearchState(
        context=ResearchContext(original_question=question),
        max_iterations=2,
        confidence_threshold=0.8,
        max_concurrency=2,
    )
    deps = ResearchDeps(client=client)

    # Blocking run (final result only)
    result = await graph.run(
        PlanNode(provider="openai", model="gpt-4o-mini"),
        state=state,
        deps=deps,
    )
    print(result.output.title)

    # Streaming progress (log/report/error events)
    async for event in stream_research_graph(
        graph,
        PlanNode(provider="openai", model="gpt-4o-mini"),
        state,
        deps,
    ):
        if event.type == "log":
            iteration = event.state.iterations if event.state else state.iterations
            print(f"[{iteration}] {event.message}")
        elif event.type == "report":
            print("\nResearch complete!\n")
            print(event.report.title)
            print(event.report.executive_summary)

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag serve --stdio

Provides tools for document management and search directly in your AI assistant.

A2A Agent

Run as a conversational agent with the Agent-to-Agent protocol:

# Start the A2A server
haiku-rag serve --a2a

# Connect with the interactive client (in another terminal)
haiku-rag a2aclient

The A2A agent provides:

  • Multi-turn dialogue with context
  • Intelligent multi-search for complex questions
  • Source citations with titles and URIs
  • Full document retrieval on request

Examples

See the examples directory for working examples:

  • Interactive Research Assistant - Full-stack research assistant with Pydantic AI and AG-UI featuring human-in-the-loop approval and real-time state synchronization
  • Docker Setup - Complete Docker deployment with file monitoring, MCP server, and A2A agent
  • A2A Security - Authentication examples (API key, OAuth2, GitHub)

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haiku_rag-0.13.1.tar.gz (276.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haiku_rag-0.13.1-py3-none-any.whl (94.1 kB view details)

Uploaded Python 3

File details

Details for the file haiku_rag-0.13.1.tar.gz.

File metadata

  • Download URL: haiku_rag-0.13.1.tar.gz
  • Upload date:
  • Size: 276.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for haiku_rag-0.13.1.tar.gz
Algorithm Hash digest
SHA256 d7d392f1ffdad9dd40100dbb1f7459ddc258fc95f54a7197c455b9bfeb5b894d
MD5 8d12b4b3ce235296d780bec9ce7b4503
BLAKE2b-256 7978c8f7be04d290182ae1c8c6b98edcb4c220a454ab74c30052c05d03669ca0

See more details on using hashes here.

File details

Details for the file haiku_rag-0.13.1-py3-none-any.whl.

File metadata

  • Download URL: haiku_rag-0.13.1-py3-none-any.whl
  • Upload date:
  • Size: 94.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for haiku_rag-0.13.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3e1bb900890e3662ab51b3ecce3c094f8f6e5ce00d1880220cd5be1e02531352
MD5 18a217f0c5f72955aec7a8ff6a0dc278
BLAKE2b-256 f60858e9f6a152555134e9bbcfd2b2e02b8f53035fb6930c3b8837ea68e11b7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page