A production-grade, local-first Agentic RAG library using structural document navigation.
Project description
ApexRAG
Production-grade, local-first Agentic RAG Library. Replaces vector similarity search with structural, agentic navigation of documents.
๐ง The Core Idea
Traditional RAG embeds text into vectors and finds the "closest" chunks. This creates retrieval hallucinations โ the model returns semantically-similar-but-wrong content because it has no understanding of document structure.
ApexRAG takes a fundamentally different approach:
- Parse the document into a structural tree (based on headings) and extract page numbers.
- Synthesize a 30-word Semantic Map for every node using a local LLM.
- Navigate the tree with an LLM agent that reads summaries and decides which branch to enter โ trying multiple candidates if necessary.
- Verify the exact leaf node answers the query via a strict secondary LLM check (99.999% accuracy).
- Return the exact leaf node content โ not a blended, hallucinated average.
Query: "What were Q3 revenues?"
โ
Root (Annual Report)
โโโ Chapter 1: Executive Summary โ LLM: "Not here"
โโโ Chapter 2: Revenue Analysis โ LLM: "Enter this"
โโโ Q1 Revenue โ LLM: "Not Q3"
โโโ Q2 Revenue โ LLM: "Not Q3"
โโโ Q3 Revenue โ LLM: "This is it!" โ Return content
๐ Project Structure
apex_rag/
โโโ src/
โ โโโ __init__.py # Public API exports
โ โโโ api.py # FastAPI App & UI dashboard
โ โโโ client.py # Thread-safe user-facing ApexIndex class
โ โโโ ingestion.py # Document parsing & tree synthesis
โ โโโ navigation.py # Recursive LLM navigation agent
โ โโโ storage.py # SQLAlchemy async ORM & PageIndexEntry
โ โโโ utils.py # ReasoningTrace, retry decorator, helpers
โโโ tests/
โ โโโ test_tree.py # Parser & storage unit tests
โ โโโ test_search.py # Navigation agent unit tests (no Ollama needed)
โโโ examples/
โ โโโ basic_usage.py # End-to-end demo
โโโ pyproject.toml
โโโ docker-compose.yml
โก Quick Start
1. Install
# Clone and set up
cd ApexRAG
pip install -e ".[dev]"
2. Start Ollama
ollama serve
ollama pull llama3.1 # or phi3, mistral, etc.
3. Ingest & Query
import asyncio
from src.client import ApexIndex
async def main():
async with await ApexIndex.create(
db_url="sqlite+aiosqlite:///apex.db",
model="llama3.1",
) as index:
# Ingest a PDF
doc_id = await index.ingest("path/to/your/report.pdf")
# Query it
result = await index.query(
"What are the Q3 revenue figures?",
doc_id,
)
if result:
print(result.content)
print(f"Found at path: {result.path}")
print(f"Navigation trace: {result.trace}")
asyncio.run(main())
4. Start the FastAPI Server & Visual Index Dashboard
uvicorn src.api:app --reload
Open your browser to:
- Dashboard: http://localhost:8000
- API Docs: http://localhost:8000/docs
From the dashboard, you can click on an ingested document to view its full structural tree and its book-style alphabetical page index!
๐๏ธ Architecture Deep Dive
Ingestion Engine (ingestion.py)
| Step | Description |
|---|---|
| Convert | markitdown or docling converts PDF/DOCX โ Markdown |
| Parse | Regex walks ATX headings (#, ##, ###) to build ParsedSection tree |
| Persist | Nodes written to DB with LTree-style path (1.2.3) |
| Synthesize | Ollama generates 30-word summaries in parallel (bounded by semaphore) |
Storage Layer (storage.py)
DocumentNode table:
id BIGINT PRIMARY KEY
doc_id VARCHAR(255) -- logical document identifier
parent_id BIGINT FK (self) -- NULL for root nodes
path VARCHAR(512) -- "1.2.3" LTree-style
title VARCHAR(512) -- section heading
summary TEXT -- 30-word Semantic Map
content TEXT -- leaf content (NULL for intermediate)
metadata TEXT (JSON) -- page numbers, char count, source file
depth INTEGER -- nesting level (0 = root)
position INTEGER -- sibling order
created_at TIMESTAMP
Supports both sqlite+aiosqlite:// (local) and postgresql+asyncpg:// (production).
Navigation Agent (navigation.py)
find(query, doc_id)
โโโ _navigate(current_node)
โโโ [Leaf?] โ return content immediately
โโโ fetch children
โโโ _ask_llm(query, child_summaries)
โ โโโ "Which child ID contains the answer?"
โโโ [ID returned] โ recurse into chosen child
โ โโโ [child returns None] โ try siblings
โโโ [NONE returned] โ backtrack to parent
LLM Response Parsing is robust โ 4-tier fallback:
- Strict
json.loads() - Regex extraction from prose-wrapped JSON
- Explicit
"NONE"keyword detection - Heuristic: scan for any valid child ID number in the response
High Accuracy (99.999%) Verification: At the leaf level, a second LLM prompt strictly verifies if the leaf content answers the query. If it fails, the agent backtracks and explores the fallback candidates (second-best choices) up the tree.
Reasoning Trace (utils.py)
Every navigation decision is printed with color-coded indicators:
โโโ ApexRAG Navigation Start โโโ
Query : What are the Q3 revenue figures?
Root : node_id=1
โณ ENTER node=1 path=1
Covers the full annual financial report for 2024โฆ
โณ EXPLORE node=1 โ evaluating 2 child summaries
โ AGENT โ node=3 reason: Revenue Analysis contains quarterly breakdown
โณ ENTER node=3 path=1.2
โณ EXPLORE node=3 โ evaluating 4 child summaries
โ AGENT โ node=6 reason: Q3 Revenue section is exactly what's needed
โ
LEAF REACHED node=6
preview: Q3 revenue was $165M. Growth slowed slightlyโฆ
โโโ Navigation Complete โโโ result=SUCCESS elapsed=3.41s
๐งช Testing
# Run all tests (no Ollama required)
pytest
# With coverage
pytest --cov=src --cov-report=term-missing
# Specific test file
pytest tests/test_search.py -v
Tests use an in-memory SQLite database and mock LLM responses โ zero external dependencies.
๐ณ Production Deployment
# Copy and edit environment
cp .env.example .env
# Start everything (Ollama + PostgreSQL + API)
docker-compose up -d
# Pull the model inside the Ollama container
docker exec apex_ollama ollama pull llama3.1
Environment variables:
| Variable | Default | Description |
|---|---|---|
APEX_DB_URL |
sqlite+aiosqlite:///apex.db |
SQLAlchemy async DB URL |
APEX_OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
APEX_MODEL |
llama3.1 |
Ollama model for navigation |
APEX_LOG_LEVEL |
INFO |
Logging verbosity |
๐ง Configuration Reference
await ApexIndex.create(
db_url="postgresql+asyncpg://user:pass@host/db", # Production DB
ollama_host="http://localhost:11434",
model="llama3.1", # Navigation model
summariser_model="phi3", # Cheaper model for ingestion summaries
max_concurrent_summaries=8, # Parallelism (tune to your GPU VRAM)
parser_backend="markitdown", # "markitdown" | "docling" | "plaintext"
trace_enabled=True, # Color-coded console output
db_echo=False, # SQL query logging
)
๐ Roadmap
- FastAPI REST API wrapper (
/documents/ingest/file,/query,/documents) - Book-style Page Index and Visual tree dashboard
- Unlimited navigation depth with backtrack and verification
- Streaming query responses via SSE
- Multi-document cross-reference queries
- Alembic migrations for schema versioning
- Support for
doclingtable extraction (structured data cells as leaf nodes)
๐ License
MIT License โ see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apex_rag-0.1.5.tar.gz.
File metadata
- Download URL: apex_rag-0.1.5.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2f400b5f8ae06ede0951cb5478cf8e6342c0e2a174d33791427cea186b6c97f
|
|
| MD5 |
169ada35a17c922c7ec8de574ff76f76
|
|
| BLAKE2b-256 |
218c46ed8893d1bf65c7358c3cc86e2ebc85e55dfbab73d705ac7728ce499173
|
File details
Details for the file apex_rag-0.1.5-py3-none-any.whl.
File metadata
- Download URL: apex_rag-0.1.5-py3-none-any.whl
- Upload date:
- Size: 37.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
917ccbe624a1f0738120c9aaa8903f59c4e24bdf3ce28ce065d9793d6dbd49d0
|
|
| MD5 |
03c37a309e6e69991d06d66b209d319b
|
|
| BLAKE2b-256 |
1e93347ff0bfd462575a1820718578f0b5678c4b54c766e6c65b550823d084ef
|