Vectorless, Reasoning-based RAG — multi-document agentic retrieval with zero hallucination
Project description
AgenticRAG
Vectorless, Reasoning-based RAG for Python
No Vector DB · No Chunking · No Embeddings
Just pure LLM reasoning over your documents.
Installation · Quick Start · Web UI · API Reference · Models
AgenticRAG is a Python library that lets you ask questions about your PDF documents using AI — without any vector databases, embeddings, or chunking.
Instead of the traditional RAG approach, AgenticRAG:
- Builds a smart tree index from your document (like a Table of Contents, but smarter)
- Uses AI agents to reason over the tree and find the right sections
- Verifies every answer against the source text — zero hallucinations
It works with Groq Cloud (free API key) or local LLMs via Ollama (100% free, runs on your machine).
Installation
pip install agenticrag
That's it. You're ready to go.
Optional extras:
pip install agenticrag[web] # Web UI (includes FastAPI server)
pip install agenticrag[gcs] # Google Cloud Storage backend
pip install agenticrag[neo4j] # Neo4j graph backend
pip install agenticrag[all] # Everything
Note: If you clone the repo and want to run
server.py, install with:pip install -e ".[web]"
Quick Start (30 seconds)
Step 1: Get a free API key
Go to console.groq.com -- Create an API Key -- Copy it.
Step 2: Set your API key
Create a .env file in your project folder:
GROQ_API_KEY=gsk_your_key_here
Step 3: Ask questions about any PDF
from agenticrag import Forest
# Create a knowledge base and add your PDF
forest = Forest(verbose=True)
forest.add("report.pdf")
# Ask a question
result = forest.ask("What was the net income?")
print(result.text)
That's the entire setup. Three lines of real code. No vector DB to configure, no embeddings to generate, no chunks to tune.
Use AgenticRAG in Your Own Project
AgenticRAG is designed to be a drop-in RAG engine for any Python project. You don't need to understand how RAG works, build any retrieval pipelines, or set up any infrastructure. Just install, import, and ask questions.
How It Works (The Simple Version)
Your app --> agenticrag --> Answer with sources
(handles everything:
PDF parsing, indexing, multi-agent search,
hallucination checking, citations)
You write zero retrieval code. AgenticRAG handles all of it internally.
Example 1: Add Document Q&A to a Flask App
# app.py
from flask import Flask, request, jsonify
from agenticrag import Forest
app = Flask(__name__)
# Create the knowledge base ONCE when the app starts
forest = Forest(verbose=True)
forest.add("company_handbook.pdf")
forest.add("product_docs.pdf")
@app.route("/ask", methods=["POST"])
def ask():
question = request.json["question"]
result = forest.ask(question)
return jsonify({
"answer": result.text,
"confidence": result.confidence,
"sources": result.sources,
})
if __name__ == "__main__":
app.run(port=5000)
# Install
pip install agenticrag flask
# Run
python app.py
# Test
curl -X POST http://localhost:5000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is our refund policy?"}'
Example 2: Build a FastAPI Document API
# api.py
from fastapi import FastAPI
from pydantic import BaseModel
from agenticrag import Forest
app = FastAPI(title="My Document API")
forest = Forest(data_dir="./knowledge_base", verbose=True)
# Add all your documents at startup
forest.add_directory("./docs/", pattern="*.pdf")
class Question(BaseModel):
text: str
@app.post("/query")
async def query(q: Question):
result = forest.ask(q.text)
return {
"answer": result.text,
"confidence": result.confidence,
"pages_used": result.sources,
"time_seconds": result.elapsed_seconds,
}
pip install agenticrag fastapi uvicorn
uvicorn api:app --reload
Example 3: Streamlit Chat App (10 lines)
# chat.py
import streamlit as st
from agenticrag import Forest
st.title("Chat with your Documents")
# Initialize once
if "forest" not in st.session_state:
st.session_state.forest = Forest(verbose=True)
st.session_state.forest.add("report.pdf")
question = st.text_input("Ask a question:")
if question:
result = st.session_state.forest.ask(question)
st.write(result.text)
st.caption(f"Confidence: {result.confidence:.0%} | {result.elapsed_seconds:.1f}s")
pip install agenticrag streamlit
streamlit run chat.py
Example 4: Simple Python Script
The simplest possible usage — no web framework, no server, just a script:
# ask.py
from agenticrag import Forest
# Point to your documents
forest = Forest()
forest.add("quarterly_report.pdf")
forest.add("annual_report.pdf")
# Ask questions
questions = [
"What was the total revenue?",
"What are the main risk factors?",
"How did expenses change year over year?",
]
for q in questions:
result = forest.ask(q)
print(f"\nQ: {q}")
print(f"A: {result.text}")
print(f" Confidence: {result.confidence:.0%}")
print(f" Sources: {[s['doc_title'] + ' p.' + s['pages'] for s in result.sources]}")
pip install agenticrag
python ask.py
Example 5: Add to an Existing Django Project
# views.py
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from agenticrag import Forest
import json
# Initialize once — reused across all requests
_forest = Forest(data_dir="./django_knowledge_base")
_forest.add_directory("./media/documents/", pattern="*.pdf")
@csrf_exempt
def document_qa(request):
if request.method == "POST":
body = json.loads(request.body)
result = _forest.ask(body["question"])
return JsonResponse({
"answer": result.text,
"confidence": result.confidence,
})
Example 6: Use with Local LLMs (100% Free, No API Key)
from agenticrag import Forest, LocalModel
# No API key needed — runs on your machine
forest = Forest(
model=LocalModel.QWEN3_4B,
base_url="http://localhost:11434/v1",
)
forest.add("confidential_report.pdf") # data never leaves your machine
result = forest.ask("What are the projected earnings?")
print(result.text)
Example 7: Build a CLI Tool
# doctool.py
import sys
from agenticrag import Forest
forest = Forest(data_dir="./my_index")
if len(sys.argv) > 2 and sys.argv[1] == "add":
result = forest.add(sys.argv[2])
print(f"Indexed: {result.title} ({result.page_count} pages)")
elif len(sys.argv) > 2 and sys.argv[1] == "ask":
question = " ".join(sys.argv[2:])
result = forest.ask(question)
print(f"\n{result.text}\n")
print(f"Confidence: {result.confidence:.0%} | Sources: {len(result.sources)}")
else:
print("Usage:")
print(" python doctool.py add report.pdf")
print(' python doctool.py ask "What was the revenue?"')
python doctool.py add report.pdf
python doctool.py ask "What was the total revenue in Q4?"
The Key Idea
No matter what you're building — a web app, an API, a chatbot, a CLI tool, or a data pipeline — the pattern is always the same:
from agenticrag import Forest
# 1. Create a Forest (one time)
forest = Forest()
# 2. Add your documents (one time — they're cached)
forest.add("your_file.pdf")
# 3. Ask questions (as many times as you want)
result = forest.ask("your question here")
print(result.text) # the answer
print(result.confidence) # how sure it is
print(result.sources) # where it found the info
That's it. Three steps. No RAG architecture knowledge needed. AgenticRAG handles all the indexing, multi-agent search, synthesis, and hallucination checking internally.
Using AgenticRAG as a Library
Single Document — PageIndex
If you only have one document, use PageIndex for the simplest experience:
from agenticrag import PageIndex
# Load and index a PDF (one-time, takes ~30 seconds)
pi = PageIndex()
pi.load("annual_report.pdf")
# Ask questions
result = pi.ask("What was the net income in 2023?")
print(result.text)
# Save the index so you don't have to rebuild it next time
pi.save("report_index.json")
# Later, load it instantly
pi = PageIndex()
pi.load_json("report_index.json")
result = pi.ask("What about revenue?")
Multiple Documents — Forest
For multiple documents, use Forest. It searches across ALL your documents at once:
from agenticrag import Forest
# Create a knowledge base
forest = Forest(verbose=True)
# Add documents one by one
forest.add("report_2023.pdf")
forest.add("report_2024.pdf")
# Or add an entire folder of PDFs
forest.add_directory("./contracts/", pattern="*.pdf")
# Ask questions across ALL documents
result = forest.ask("Compare revenue growth between 2023 and 2024")
# Rich result object
print(result.text) # The answer
print(result.confidence) # 0.0-1.0 confidence score
print(result.sources) # Which documents & pages were used
print(result.documents_searched) # Which doc IDs were searched
print(result.elapsed_seconds) # How long it took
Using Local LLMs (Free, No API Key Needed)
You can run AgenticRAG 100% locally with Ollama:
# 1. Install Ollama from https://ollama.com/download
# 2. Pull a model (Qwen3 4B recommended — only 2.5 GB download)
ollama pull qwen3:4b
from agenticrag import Forest, LocalModel
forest = Forest(
model=LocalModel.QWEN3_4B,
base_url="http://localhost:11434/v1",
verbose=True,
)
forest.add("report.pdf")
result = forest.ask("What are the key risks?")
print(result.text)
Batch Ingestion (100+ Documents)
For large collections, use the batch pipeline — it's 2-4x faster:
from agenticrag import Forest, LocalModel
forest = Forest(
model=LocalModel.QWEN3_4B,
base_url="http://localhost:11434/v1",
data_dir="./my_knowledge_base",
verbose=True,
)
# Ingest all PDFs with progress logging
result = forest.add_directory_batch(
"./papers/",
pattern="*.pdf",
resume=True, # skip already-indexed docs
skip_description=True, # halves LLM calls (faster)
max_llm_concurrent=2, # concurrent LLM requests
)
print(result)
# BatchResult(total=14000, succeeded=13950, failed=50, skipped=0)
# Now query across all documents
answer = forest.ask("What are the latest findings on X?")
Multi-Turn Conversations
AgenticRAG remembers your conversation automatically:
forest = Forest(verbose=True)
forest.add("report.pdf")
# First question
result = forest.ask("What was the revenue?")
print(result.text) # "Revenue was $5.2 billion..."
# Follow-up — it remembers the context
result = forest.ask("How does that compare to last year?")
print(result.text) # "Compared to last year's $4.8 billion..."
# Reset when you want to start fresh
forest.clear_history()
Web UI
AgenticRAG includes a beautiful web interface for chatting with your documents:
# Install web dependencies (if not already)
pip install agenticrag[web]
# Start the server
python server.py
This opens a web app at http://localhost:8000 where you can:
- Create notebooks to organize your documents
- Upload PDFs via drag-and-drop
- Chat with your documents (with source citations)
- Switch between Groq Cloud and local LLM providers
- Share over LAN — anyone on your network can access it
# Custom port
python server.py --port 9000
# Or use the module command
python -m agenticrag serve --port 9000
Split Architecture (GPU on one machine, UI on another)
# Machine A (GPU): Run Ollama
set OLLAMA_HOST=0.0.0.0
ollama serve
# Machine B (laptop): Point AgenticRAG to Machine A
python server.py
# Then in Settings > Local LLM > Base URL: http://MACHINE_A_IP:11434/v1
API Reference
High-Level Classes
| Class | What it does | When to use |
|---|---|---|
Forest |
Multi-document knowledge base | Most common — use this for everything |
PageIndex |
Single-document index | When you only have one document |
ForestResult |
Result from Forest.ask() |
Access .text, .sources, .confidence |
Forest Methods
| Method | Description |
|---|---|
Forest(model=..., verbose=True) |
Create a new knowledge base |
.add("file.pdf") |
Add a single document |
.add_directory("./docs/") |
Add all PDFs from a folder |
.add_directory_batch("./docs/") |
Fast batch add (100+ docs) |
.ask("question") |
Ask a question across all docs |
.documents() |
List all indexed documents |
.remove(doc_id) |
Remove a document |
.size |
Number of documents |
.clear_history() |
Reset conversation memory |
.info() |
Forest status summary |
PageIndex Methods
| Method | Description |
|---|---|
PageIndex(model=..., verbose=True) |
Create a new single-doc index |
.load("file.pdf") |
Index a document |
.save("index.json") |
Save the index to disk |
.load_json("index.json") |
Load a saved index |
.ask("question") |
Ask a question |
ForestResult Fields
| Field | Type | Description |
|---|---|---|
.text |
str |
The final verified answer |
.confidence |
float |
0.0 to 1.0 confidence score |
.sources |
list |
Which documents/pages were used |
.documents_searched |
list |
Which doc IDs were searched |
.reasoning_trace |
list |
Step-by-step agent pipeline trace |
.was_rewritten |
bool |
Whether the Critic modified the answer |
.hallucinations |
list |
Any hallucinations that were caught |
.elapsed_seconds |
float |
Total time taken |
Configuration
from agenticrag import ForestConfig, GroqModel
config = ForestConfig(
model = GroqModel.GPT_OSS_20B, # Which AI model to use
data_dir = "./my_data", # Where to store indices
max_docs_per_query = 5, # Max docs to search per question
max_hunt_workers = 5, # Parallel search threads
enable_critic = True, # Zero-hallucination checking
verbose = True, # Print progress
)
Or pass these directly to Forest():
forest = Forest(
model=GroqModel.GPT_OSS_20B,
data_dir="./my_data",
verbose=True,
)
Supported Models
Cloud Models (Groq — Free API)
from agenticrag import Forest, GroqModel
forest = Forest(model=GroqModel.GPT_OSS_20B) # Fast, recommended default
forest = Forest(model=GroqModel.GPT_OSS_120B) # Largest, best reasoning
forest = Forest(model=GroqModel.LLAMA4_SCOUT) # Llama 4 Scout
forest = Forest(model=GroqModel.LLAMA3_3_70B) # Llama 3.3 70B
forest = Forest(model=GroqModel.QWEN3_32B) # Qwen 3 32B
forest = Forest(model=GroqModel.DEEPSEEK_R1_DISTILL_LLAMA_70B) # DeepSeek R1
Local Models (Ollama — 100% Free)
# Install from https://ollama.com/download, then:
ollama pull qwen3:4b # 2.5 GB — recommended
ollama pull qwen3:8b # 5.2 GB — better quality
ollama pull qwen3:14b # 9.3 GB — even better
from agenticrag import Forest, LocalModel
forest = Forest(
model=LocalModel.QWEN3_4B,
base_url="http://localhost:11434/v1",
)
| Model | Download Size | VRAM Needed | Best For |
|---|---|---|---|
LocalModel.QWEN3_4B |
2.5 GB | 5 GB or less | Low-VRAM GPUs, fastest |
LocalModel.QWEN3_8B |
5.2 GB | 8 GB or less | Best quality/size ratio |
LocalModel.QWEN3_14B |
9.3 GB | 12 GB or less | Higher quality |
LocalModel.QWEN3_30B |
19 GB | 24 GB or less | Strong reasoning |
LocalModel.LLAMA3_2_3B |
2.0 GB | 4 GB or less | Ultra-lightweight |
LocalModel.MISTRAL |
4.1 GB | 6 GB or less | General purpose |
LocalModel.GEMMA3_12B |
8.1 GB | 12 GB or less | Alternative mid-range |
How It Works
AgenticRAG uses a multi-agent pipeline — like a team of AI researchers working together:
Your Question
|
v
+---------+ Looks at the document graph to find
| Planner |--> which documents might have the answer
+---------+
|
v
+---------+ Searches those documents IN PARALLEL
| Hunters |--> using tree-based reasoning (not keywords!)
+---------+
|
v
+--------------+ Combines evidence from multiple docs
| Synthesizer |--> into a single, coherent answer
+--------------+
|
v
+--------+ Checks every claim against the source text
| Critic |--> removes anything not backed by evidence
+--------+
|
v
Verified Answer
This is why AgenticRAG can answer complex questions across many documents — it doesn't just find similar text, it actually reasons about what's relevant.
Project Structure
agenticrag/
|-- __init__.py # Public API (Forest, PageIndex, etc.)
|-- __main__.py # python -m agenticrag entry point
|-- config.py # GroqModel, LocalModel, configuration
|-- groq_client.py # LLM wrapper (Groq + OpenAI-compatible)
|-- pdf_parser.py # PDF/Markdown text extraction
|-- pdf_to_markdown.py # PDF to Markdown converter (no LLM)
|-- prompts.py # All LLM prompts
|-- tree_builder.py # Build hierarchical tree index
|-- tree_search.py # Single-document tree search
|-- pageindex.py # PageIndex (single-doc wrapper)
|-- forest.py # Forest (multi-doc entry point)
|-- agents/
| |-- planner.py # Document selection from graph
| |-- hunter.py # Parallel document searching
| |-- synthesizer.py # Multi-doc answer synthesis
| |-- evaluator.py # Retrieval sufficiency checking
| |-- critic.py # Zero-hallucination enforcer
| +-- orchestrator.py # Agentic loop state machine
|-- storage/
| |-- base.py # Abstract TreeStore interface
| |-- local.py # Local filesystem (default)
| +-- gcs.py # Google Cloud Storage
|-- graph/
| |-- base.py # Abstract DocumentGraph interface
| |-- sqlite_graph.py # SQLite + FTS5 (default)
| +-- neo4j_graph.py # Neo4j (production scale)
+-- ingestion/
|-- metadata.py # LLM metadata extraction
|-- pipeline.py # Single-document ingestion
+-- batch.py # Batch ingestion (100K+ docs)
Storage Backends
Local Filesystem (Default)
# Automatic — just use Forest() and it stores in ./pageindex_data/
forest = Forest()
Google Cloud Storage
pip install agenticrag[gcs]
from agenticrag import Forest
from agenticrag.storage import GCSStore
store = GCSStore(
bucket_name="my-bucket",
prefix="trees/",
credentials="path/to/service-account.json",
)
forest = Forest(store=store)
Neo4j Graph (Production)
pip install agenticrag[neo4j]
from agenticrag import Forest
from agenticrag.graph import Neo4jGraph
graph = Neo4jGraph(
uri="bolt://localhost:7687",
user="neo4j",
password="your_password",
)
forest = Forest(graph=graph)
Tips and Best Practices
| Tip | Why |
|---|---|
Start with Forest() |
Works out of the box, zero config |
Use verbose=True |
See exactly what the AI agents are doing |
| Use batch ingestion for 100+ docs | forest.add_directory_batch() is 2-4x faster |
Set skip_description=True in batch |
Halves the number of LLM calls |
Use resume=True in batch |
Safely restart interrupted runs |
Use skip_critic=True for speed |
Faster answers (but less hallucination protection) |
| Try Qwen3 4B for local LLMs | Best quality-per-VRAM model available (2.5 GB) |
Hardware Guide (Local LLMs)
| Setup | GPU | VRAM | Recommended Model | Speed |
|---|---|---|---|---|
| Minimum | Any NVIDIA (CUDA >= 5.0) | 4-5 GB | Qwen3 4B | Good |
| Good | RTX 3060 / Quadro P2000 | 5-12 GB | Qwen3 4B or 8B | Better |
| Recommended | RTX 4070 Ti | 16 GB | Qwen3 8B or 14B | Fast |
| Ideal | RTX 4090 | 24 GB | Qwen3 30B | Fastest |
Note: AMD GPUs with Polaris/GCN architecture (like RX 580) are not supported by Ollama. Only NVIDIA GPUs with CUDA >= 5.0 and AMD RDNA (RX 5000+) work.
FAQ
What file types are supported?
PDF, Markdown (.md), and plain text (.txt) files.
Do I need a GPU?
No. If you use Groq Cloud (free API), everything runs in the cloud. You only need a GPU if you want to run local LLMs via Ollama.
How is this different from LangChain / LlamaIndex?
Traditional RAG (LangChain, LlamaIndex) splits documents into chunks and uses vector similarity to find relevant pieces. This breaks down on professional documents because similarity is not the same as relevance.
AgenticRAG builds a hierarchical tree index and uses LLM reasoning to navigate it — like a human expert flipping through a report. No vectors, no embeddings, no chunking.
How do I get a Groq API key?
- Go to console.groq.com
- Sign up (free)
- Go to API Keys > Create API Key
- Copy the key that starts with
gsk_
Can I use OpenAI / Anthropic / other providers?
AgenticRAG works with any OpenAI-compatible API. Set the base_url parameter to point to your provider's endpoint.
License
MIT — free to use in personal and commercial projects.
Credits
Architecture inspired by VectifyAI/PageIndex.
Built by Arham Mirkar.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_rag_core-2.0.1.tar.gz.
File metadata
- Download URL: agentic_rag_core-2.0.1.tar.gz
- Upload date:
- Size: 73.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a37f5f5fa5700a9f9bfb04bf4464d57f66129436ae273a2ded3af478a2c8dd15
|
|
| MD5 |
8fda807e9cdf3fe047c15af035c0f487
|
|
| BLAKE2b-256 |
9e5926d4b99f5785fd452dc08f40123d9e05ec9f5fc3a5a6b8c715976647b5b3
|
File details
Details for the file agentic_rag_core-2.0.1-py3-none-any.whl.
File metadata
- Download URL: agentic_rag_core-2.0.1-py3-none-any.whl
- Upload date:
- Size: 84.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92c658427f908f313c7a20f3b623d4311d7500173c85c802ed8389523ef74220
|
|
| MD5 |
606e5ab8cf7b4881f1d230606b13c9ce
|
|
| BLAKE2b-256 |
22e6fea8983b45dfb64dbe1dfe01e0cd5970ac950b650814199346167560919f
|