Skip to main content

RAG-in-a-Box: Zero-Configuration Self-Building Agentic RAG System

Project description

RAGBox-Core

PyPI version License: MIT Python 3.11+ CI

📚 Read the Official Documentation

RAG-in-a-Box: Zero-Configuration Self-Building Agentic RAG System

RAGBox is a production-ready, auto-configuring, async-first RAG engine that combines Vector Search, Agentic Orchestration, and Graph Retrieval natively.

Installation

pip install ragbox

Note on Dependencies: Advanced document processing features like OCR and complex PDF parsing require system-level dependencies. Depending on your OS, you may need to install standard C++ build tools or Tesseract for paddleocr and pdfplumber to function optimally.

Configuration (API Keys)

RAGBox auto-detects cloud providers. For the best experience, set one of the following environment variables before running:

export OPENAI_API_KEY="sk-..."
# OR
export ANTHROPIC_API_KEY="sk-ant-..."
# OR
export GROQ_API_KEY="gsk_..."

If no keys are found, RAGBox falls back to a local LLaMA model (requires manual model download to models/llama-3.1-8b-instruct.gguf).

Quick Start (3-Line API)

from ragbox import RAGBox

# Automatically ingests, builds graphs, configures vector db, and chunks
rag = RAGBox("./company-docs")

# Intelligent routing via query classification
answer = rag.query("What's our vacation policy?")
print(answer)

CLI Interface

RAGBox provides a dead-simple CLI for running locally without writing code:

# Point to your documents. RAGBox will self-build the index and graph.
ragbox init ./company-docs

# Query the active index
ragbox query "What's our vacation policy?" -d ./company-docs

Architecture

graph TD
    A[Local Documents] --> B{Document Processor Auto-Router}
    B --> C[AST / OCR / PDF Parsing]
    C --> D[Chunking Engine]
    D --> E[(Vector Store)]
    C --> F[(Knowledge Graph)]
    
    Q[User Query] --> G[Agentic Orchestrator]
    G --> H[Retrieval Fusion Engine]
    E --> H
    F --> H
    H --> G
    G --> I[Final Answer]

Risk Surface Analysis

  • Temporal Edges (T=0 vs T=Scale): At T=0, ragbox init is blocking to guarantee index availability. At T=scale, the background daemon handles delta updates (via watchdog) to prevent index staleness and thundering herds.
  • Adversarial Edges: Subject to standard prompt injection if queries are exposed raw to external users. The Orchestrator currently assumes trusted inputs.
  • Resource Edges: High concurrency read/write spikes memory due to dual maintenance of the local Vector DB and the Knowledge Graph.

Features

  • Self-Healing Infrastructure: Watchdog auto-detects changes and updates vector stores & knowledge graphs incrementally, preventing index staleness or storms.
  • Auto-Document Intelligence: Automatically detects PDF, Text, Images, and Code to use AST, OCR (paddleocr), or structural layouts (pdfplumber).
  • Cost Estimator: See the expected USD cost of indexing before it runs.
  • Auto-Knowledge-Graph (GraphRAG): Extracts entities and communities automatically using the Leiden algorithm for structured reasoning.
  • Retrieval Fusion & Reranking: Merges Dense Vectors and Graph Search using Reciprocal Rank Fusion, then reranks the massive candidate pool using a highly accurate ms-marco Cross-Encoder.
  • Late Chunking: Contextual sequence embeddings! Vectors are calculated over the full document bounds before being pooled into chunks, preserving global semantic context within local tokens.
  • Agentic Orchestrator & Intelligent Routing: Automatically routes incoming queries into 6 distinct pipelines: Vector, Keyword, Graph, Multi-Query, Time-Based, and Agentic.
  • Multi-Query Expansion: Broad intent queries are dynamically expanded into multiple variations by the LLM, retrieving and fusing results across all variations for unparalleled recall.

Contributing

We welcome contributions to RAGBox-Core! Please see our CONTRIBUTING.md for details on how to set up your development environment, run the test suite, and submit Pull Requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragbox_core-1.0.6.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragbox_core-1.0.6-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file ragbox_core-1.0.6.tar.gz.

File metadata

  • Download URL: ragbox_core-1.0.6.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.17.0-14-generic

File hashes

Hashes for ragbox_core-1.0.6.tar.gz
Algorithm Hash digest
SHA256 12d37e1a519f6e227dcbf2389ec3392612ec986c1f69e8ae703735b575f7b8df
MD5 c27efd0a739c8be003b4963a54532370
BLAKE2b-256 99b9cdc74910c7a21efe66e25600428ee7a4c1fd95681000231b2204ffc667b6

See more details on using hashes here.

File details

Details for the file ragbox_core-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: ragbox_core-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.17.0-14-generic

File hashes

Hashes for ragbox_core-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1f788edccbc99439a454537702936d80609cd03b7f071e46ce0725683a43041c
MD5 96f6529c0d9d297e6afa0ca5ad80960d
BLAKE2b-256 4f13fe755b9fb8b52f4715ec0f0db222f6a25256317196c8cb391171ed21dd36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page