Skip to main content

A production-grade, local-first Agentic RAG library using structural document navigation.

Project description

ApexRAG

Production-grade, local-first Agentic RAG Library. Replaces vector similarity search with structural, agentic navigation of documents.


๐Ÿง  The Core Idea

Traditional RAG embeds text into vectors and finds the "closest" chunks. This creates retrieval hallucinations โ€” the model returns semantically-similar-but-wrong content because it has no understanding of document structure.

ApexRAG takes a fundamentally different approach:

  1. Parse the document into a structural tree (based on headings) and extract page numbers.
  2. Synthesize a 30-word Semantic Map for every node using a local LLM.
  3. Navigate the tree with an LLM agent that reads summaries and decides which branch to enter โ€” trying multiple candidates if necessary.
  4. Verify the exact leaf node answers the query via a strict secondary LLM check (99.999% accuracy).
  5. Return the exact leaf node content โ€” not a blended, hallucinated average.
Query: "What were Q3 revenues?"
         โ”‚
    Root (Annual Report)
    โ”œโ”€โ”€ Chapter 1: Executive Summary  โ† LLM: "Not here"
    โ””โ”€โ”€ Chapter 2: Revenue Analysis   โ† LLM: "Enter this"
        โ”œโ”€โ”€ Q1 Revenue                โ† LLM: "Not Q3"
        โ”œโ”€โ”€ Q2 Revenue                โ† LLM: "Not Q3"
        โ””โ”€โ”€ Q3 Revenue                โ† LLM: "This is it!" โ†’ Return content

๐Ÿ“ Project Structure

apex_rag/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py       # Public API exports
โ”‚   โ”œโ”€โ”€ api.py            # FastAPI App & UI dashboard
โ”‚   โ”œโ”€โ”€ client.py         # Thread-safe user-facing ApexIndex class
โ”‚   โ”œโ”€โ”€ ingestion.py      # Document parsing & tree synthesis
โ”‚   โ”œโ”€โ”€ navigation.py     # Recursive LLM navigation agent
โ”‚   โ”œโ”€โ”€ storage.py        # SQLAlchemy async ORM & PageIndexEntry
โ”‚   โ””โ”€โ”€ utils.py          # ReasoningTrace, retry decorator, helpers
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_tree.py      # Parser & storage unit tests
โ”‚   โ””โ”€โ”€ test_search.py    # Navigation agent unit tests (no Ollama needed)
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ basic_usage.py    # End-to-end demo
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ docker-compose.yml

โšก Quick Start

1. Install

# Clone and set up
cd ApexRAG
pip install -e ".[dev]"

2. Start Ollama

ollama serve
ollama pull llama3.1   # or phi3, mistral, etc.

3. Ingest & Query

import asyncio
from src.client import ApexIndex

async def main():
    async with await ApexIndex.create(
        db_url="sqlite+aiosqlite:///apex.db",
        model="llama3.1",
    ) as index:
        # Ingest a PDF
        doc_id = await index.ingest("path/to/your/report.pdf")

        # Query it
        result = await index.query(
            "What are the Q3 revenue figures?",
            doc_id,
        )

        if result:
            print(result.content)
            print(f"Found at path: {result.path}")
            print(f"Navigation trace: {result.trace}")

asyncio.run(main())

4. Start the FastAPI Server & Visual Index Dashboard

uvicorn src.api:app --reload

Open your browser to:

From the dashboard, you can click on an ingested document to view its full structural tree and its book-style alphabetical page index!


๐Ÿ—๏ธ Architecture Deep Dive

Ingestion Engine (ingestion.py)

Step Description
Convert markitdown or docling converts PDF/DOCX โ†’ Markdown
Parse Regex walks ATX headings (#, ##, ###) to build ParsedSection tree
Persist Nodes written to DB with LTree-style path (1.2.3)
Synthesize Ollama generates 30-word summaries in parallel (bounded by semaphore)

Storage Layer (storage.py)

DocumentNode table:
  id          BIGINT PRIMARY KEY
  doc_id      VARCHAR(255)       -- logical document identifier
  parent_id   BIGINT FK (self)   -- NULL for root nodes
  path        VARCHAR(512)       -- "1.2.3" LTree-style
  title       VARCHAR(512)       -- section heading
  summary     TEXT               -- 30-word Semantic Map
  content     TEXT               -- leaf content (NULL for intermediate)
  metadata    TEXT (JSON)        -- page numbers, char count, source file
  depth       INTEGER            -- nesting level (0 = root)
  position    INTEGER            -- sibling order
  created_at  TIMESTAMP

Supports both sqlite+aiosqlite:// (local) and postgresql+asyncpg:// (production).

Navigation Agent (navigation.py)

find(query, doc_id)
  โ””โ”€โ”€ _navigate(current_node)
        โ”œโ”€โ”€ [Leaf?] โ†’ return content immediately
        โ”œโ”€โ”€ fetch children
        โ”œโ”€โ”€ _ask_llm(query, child_summaries)
        โ”‚     โ””โ”€โ”€ "Which child ID contains the answer?"
        โ”œโ”€โ”€ [ID returned] โ†’ recurse into chosen child
        โ”‚     โ””โ”€โ”€ [child returns None] โ†’ try siblings
        โ””โ”€โ”€ [NONE returned] โ†’ backtrack to parent

LLM Response Parsing is robust โ€” 4-tier fallback:

  1. Strict json.loads()
  2. Regex extraction from prose-wrapped JSON
  3. Explicit "NONE" keyword detection
  4. Heuristic: scan for any valid child ID number in the response

High Accuracy (99.999%) Verification: At the leaf level, a second LLM prompt strictly verifies if the leaf content answers the query. If it fails, the agent backtracks and explores the fallback candidates (second-best choices) up the tree.

Reasoning Trace (utils.py)

Every navigation decision is printed with color-coded indicators:

โ”โ”โ” ApexRAG Navigation Start โ”โ”โ”
Query : What are the Q3 revenue figures?
Root  : node_id=1

  โ†ณ ENTER node=1 path=1
    Covers the full annual financial report for 2024โ€ฆ
    โŸณ EXPLORE node=1 โ†’ evaluating 2 child summaries
    โœ” AGENT โ†’ node=3  reason: Revenue Analysis contains quarterly breakdown
    โ†ณ ENTER node=3 path=1.2
      โŸณ EXPLORE node=3 โ†’ evaluating 4 child summaries
      โœ” AGENT โ†’ node=6  reason: Q3 Revenue section is exactly what's needed
      โ˜… LEAF REACHED node=6
        preview: Q3 revenue was $165M. Growth slowed slightlyโ€ฆ

โ”โ”โ” Navigation Complete โ”โ”โ”  result=SUCCESS  elapsed=3.41s

๐Ÿงช Testing

# Run all tests (no Ollama required)
pytest

# With coverage
pytest --cov=src --cov-report=term-missing

# Specific test file
pytest tests/test_search.py -v

Tests use an in-memory SQLite database and mock LLM responses โ€” zero external dependencies.


๐Ÿณ Production Deployment

# Copy and edit environment
cp .env.example .env

# Start everything (Ollama + PostgreSQL + API)
docker-compose up -d

# Pull the model inside the Ollama container
docker exec apex_ollama ollama pull llama3.1

Environment variables:

Variable Default Description
APEX_DB_URL sqlite+aiosqlite:///apex.db SQLAlchemy async DB URL
APEX_OLLAMA_HOST http://localhost:11434 Ollama server URL
APEX_MODEL llama3.1 Ollama model for navigation
APEX_LOG_LEVEL INFO Logging verbosity

๐Ÿ”ง Configuration Reference

await ApexIndex.create(
    db_url="postgresql+asyncpg://user:pass@host/db",  # Production DB
    ollama_host="http://localhost:11434",
    model="llama3.1",              # Navigation model
    summariser_model="phi3",       # Cheaper model for ingestion summaries
    max_concurrent_summaries=8,    # Parallelism (tune to your GPU VRAM)
    parser_backend="markitdown",   # "markitdown" | "docling" | "plaintext"
    trace_enabled=True,            # Color-coded console output
    db_echo=False,                 # SQL query logging
)

๐Ÿ“‹ Roadmap

  • FastAPI REST API wrapper (/documents/ingest/file, /query, /documents)
  • Book-style Page Index and Visual tree dashboard
  • Unlimited navigation depth with backtrack and verification
  • Streaming query responses via SSE
  • Multi-document cross-reference queries
  • Alembic migrations for schema versioning
  • Support for docling table extraction (structured data cells as leaf nodes)

๐Ÿ“„ License

MIT License โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apex_rag-0.1.3.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apex_rag-0.1.3-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file apex_rag-0.1.3.tar.gz.

File metadata

  • Download URL: apex_rag-0.1.3.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for apex_rag-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5d17b8e06480a97095fd57e268db0479c277023f40b2fc1a68c5dd610dcfbbcb
MD5 ccfdce96ef3995619c7e3bdfc6b37cde
BLAKE2b-256 786dd226b2facbb4bace46a36e95f015f0230b84ed62e704a78df0205a6c6119

See more details on using hashes here.

File details

Details for the file apex_rag-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: apex_rag-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for apex_rag-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e64dc27047e3c94d869a3ebd911d442172da65613dcc0ba93525d61d12a224f3
MD5 1c33980901287871a99f07d8fee0ef45
BLAKE2b-256 5fa8c925c46974a6d7d59e5a458ab56fa7cc89c0e08c74e0c64adcaa934cffa6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page