Skip to main content

A production-grade, local-first Agentic RAG library using structural document navigation.

Project description

ApexRAG

Production-grade, local-first Agentic RAG Library. Replaces vector similarity search with structural, agentic navigation of documents.


๐Ÿง  The Core Idea

Traditional RAG embeds text into vectors and finds the "closest" chunks. This creates retrieval hallucinations โ€” the model returns semantically-similar-but-wrong content because it has no understanding of document structure.

ApexRAG takes a fundamentally different approach:

  1. Parse the document into a structural tree (based on headings) and extract page numbers.
  2. Synthesize a 30-word Semantic Map for every node using a local LLM.
  3. Navigate the tree with an LLM agent that reads summaries and decides which branch to enter โ€” trying multiple candidates if necessary.
  4. Verify the exact leaf node answers the query via a strict secondary LLM check (99.999% accuracy).
  5. Return the exact leaf node content โ€” not a blended, hallucinated average.
Query: "What were Q3 revenues?"
         โ”‚
    Root (Annual Report)
    โ”œโ”€โ”€ Chapter 1: Executive Summary  โ† LLM: "Not here"
    โ””โ”€โ”€ Chapter 2: Revenue Analysis   โ† LLM: "Enter this"
        โ”œโ”€โ”€ Q1 Revenue                โ† LLM: "Not Q3"
        โ”œโ”€โ”€ Q2 Revenue                โ† LLM: "Not Q3"
        โ””โ”€โ”€ Q3 Revenue                โ† LLM: "This is it!" โ†’ Return content

๐Ÿ“ Project Structure

apex_rag/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py       # Public API exports
โ”‚   โ”œโ”€โ”€ api.py            # FastAPI App & UI dashboard
โ”‚   โ”œโ”€โ”€ client.py         # Thread-safe user-facing ApexIndex class
โ”‚   โ”œโ”€โ”€ ingestion.py      # Document parsing & tree synthesis
โ”‚   โ”œโ”€โ”€ navigation.py     # Recursive LLM navigation agent
โ”‚   โ”œโ”€โ”€ storage.py        # SQLAlchemy async ORM & PageIndexEntry
โ”‚   โ””โ”€โ”€ utils.py          # ReasoningTrace, retry decorator, helpers
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_tree.py      # Parser & storage unit tests
โ”‚   โ””โ”€โ”€ test_search.py    # Navigation agent unit tests (no Ollama needed)
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ basic_usage.py    # End-to-end demo
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ docker-compose.yml

โšก Quick Start

1. Install

# Clone and set up
cd ApexRAG
pip install -e ".[dev]"

2. Start Ollama

ollama serve
ollama pull llama3.1   # or phi3, mistral, etc.

3. Ingest & Query

import asyncio
from src.client import ApexIndex

async def main():
    async with await ApexIndex.create(
        db_url="sqlite+aiosqlite:///apex.db",
        model="llama3.1",
    ) as index:
        # Ingest a PDF
        doc_id = await index.ingest("path/to/your/report.pdf")

        # Query it
        result = await index.query(
            "What are the Q3 revenue figures?",
            doc_id,
        )

        if result:
            print(result.content)
            print(f"Found at path: {result.path}")
            print(f"Navigation trace: {result.trace}")

asyncio.run(main())

4. Start the FastAPI Server & Visual Index Dashboard

uvicorn src.api:app --reload

Open your browser to:

From the dashboard, you can click on an ingested document to view its full structural tree and its book-style alphabetical page index!


๐Ÿ—๏ธ Architecture Deep Dive

Ingestion Engine (ingestion.py)

Step Description
Convert markitdown or docling converts PDF/DOCX โ†’ Markdown
Parse Regex walks ATX headings (#, ##, ###) to build ParsedSection tree
Persist Nodes written to DB with LTree-style path (1.2.3)
Synthesize Ollama generates 30-word summaries in parallel (bounded by semaphore)

Storage Layer (storage.py)

DocumentNode table:
  id          BIGINT PRIMARY KEY
  doc_id      VARCHAR(255)       -- logical document identifier
  parent_id   BIGINT FK (self)   -- NULL for root nodes
  path        VARCHAR(512)       -- "1.2.3" LTree-style
  title       VARCHAR(512)       -- section heading
  summary     TEXT               -- 30-word Semantic Map
  content     TEXT               -- leaf content (NULL for intermediate)
  metadata    TEXT (JSON)        -- page numbers, char count, source file
  depth       INTEGER            -- nesting level (0 = root)
  position    INTEGER            -- sibling order
  created_at  TIMESTAMP

Supports both sqlite+aiosqlite:// (local) and postgresql+asyncpg:// (production).

Navigation Agent (navigation.py)

find(query, doc_id)
  โ””โ”€โ”€ _navigate(current_node)
        โ”œโ”€โ”€ [Leaf?] โ†’ return content immediately
        โ”œโ”€โ”€ fetch children
        โ”œโ”€โ”€ _ask_llm(query, child_summaries)
        โ”‚     โ””โ”€โ”€ "Which child ID contains the answer?"
        โ”œโ”€โ”€ [ID returned] โ†’ recurse into chosen child
        โ”‚     โ””โ”€โ”€ [child returns None] โ†’ try siblings
        โ””โ”€โ”€ [NONE returned] โ†’ backtrack to parent

LLM Response Parsing is robust โ€” 4-tier fallback:

  1. Strict json.loads()
  2. Regex extraction from prose-wrapped JSON
  3. Explicit "NONE" keyword detection
  4. Heuristic: scan for any valid child ID number in the response

High Accuracy (99.999%) Verification: At the leaf level, a second LLM prompt strictly verifies if the leaf content answers the query. If it fails, the agent backtracks and explores the fallback candidates (second-best choices) up the tree.

Reasoning Trace (utils.py)

Every navigation decision is printed with color-coded indicators:

โ”โ”โ” ApexRAG Navigation Start โ”โ”โ”
Query : What are the Q3 revenue figures?
Root  : node_id=1

  โ†ณ ENTER node=1 path=1
    Covers the full annual financial report for 2024โ€ฆ
    โŸณ EXPLORE node=1 โ†’ evaluating 2 child summaries
    โœ” AGENT โ†’ node=3  reason: Revenue Analysis contains quarterly breakdown
    โ†ณ ENTER node=3 path=1.2
      โŸณ EXPLORE node=3 โ†’ evaluating 4 child summaries
      โœ” AGENT โ†’ node=6  reason: Q3 Revenue section is exactly what's needed
      โ˜… LEAF REACHED node=6
        preview: Q3 revenue was $165M. Growth slowed slightlyโ€ฆ

โ”โ”โ” Navigation Complete โ”โ”โ”  result=SUCCESS  elapsed=3.41s

๐Ÿงช Testing

# Run all tests (no Ollama required)
pytest

# With coverage
pytest --cov=src --cov-report=term-missing

# Specific test file
pytest tests/test_search.py -v

Tests use an in-memory SQLite database and mock LLM responses โ€” zero external dependencies.


๐Ÿณ Production Deployment

# Copy and edit environment
cp .env.example .env

# Start everything (Ollama + PostgreSQL + API)
docker-compose up -d

# Pull the model inside the Ollama container
docker exec apex_ollama ollama pull llama3.1

Environment variables:

Variable Default Description
APEX_DB_URL sqlite+aiosqlite:///apex.db SQLAlchemy async DB URL
APEX_OLLAMA_HOST http://localhost:11434 Ollama server URL
APEX_MODEL llama3.1 Ollama model for navigation
APEX_LOG_LEVEL INFO Logging verbosity

๐Ÿ”ง Configuration Reference

await ApexIndex.create(
    db_url="postgresql+asyncpg://user:pass@host/db",  # Production DB
    ollama_host="http://localhost:11434",
    model="llama3.1",              # Navigation model
    summariser_model="phi3",       # Cheaper model for ingestion summaries
    max_concurrent_summaries=8,    # Parallelism (tune to your GPU VRAM)
    parser_backend="markitdown",   # "markitdown" | "docling" | "plaintext"
    trace_enabled=True,            # Color-coded console output
    db_echo=False,                 # SQL query logging
)

๐Ÿ“‹ Roadmap

  • FastAPI REST API wrapper (/documents/ingest/file, /query, /documents)
  • Book-style Page Index and Visual tree dashboard
  • Unlimited navigation depth with backtrack and verification
  • Streaming query responses via SSE
  • Multi-document cross-reference queries
  • Alembic migrations for schema versioning
  • Support for docling table extraction (structured data cells as leaf nodes)

๐Ÿ“„ License

MIT License โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apex_rag-0.1.4.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apex_rag-0.1.4-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file apex_rag-0.1.4.tar.gz.

File metadata

  • Download URL: apex_rag-0.1.4.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for apex_rag-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6632372774ad4e15e1dc9490ee22cb3907c19f3f5ae4294355c080cb95c2a7a0
MD5 63b4ab08e4563f4c602fe736693a939c
BLAKE2b-256 8cc1cd1bc0df99886b51f191c8ef4dd0afd95614150bc483510515cb842f1780

See more details on using hashes here.

File details

Details for the file apex_rag-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: apex_rag-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for apex_rag-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 957352fe32de849c7678ea95158deaba4d5d948a4855ecbaa395551a06d582b4
MD5 f79219f5bb20d2e375f13127b944e5fc
BLAKE2b-256 4bca0a0aaec68fef3ac0aed650b67c1a4ad2825da87bfb06bf59335591374839

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page