Agentic RAG Framework built with LangGraph and Ollama
Project description
AdaptiveRAG — Agentic RAG Framework for Local LLMs
Self-optimising, fully local Retrieval-Augmented Generation built with LangGraph. AdaptiveRAG analyses your knowledge base, auto-tunes the pipeline, and routes every query through the best retrieval strategy — all without sending data to any external API.
Table of Contents
- What makes it agentic
- How it works
- Installation
- Quick start
- Configuration
- Supported document formats
- Validation queries
- API reference
- Project structure
- Contributing
- License
What makes it agentic
Most RAG pipelines execute the same fixed sequence regardless of what you ask. AdaptiveRAG uses LLM-driven decision nodes at every step so the path through the graph changes per query and per knowledge base.
| Capability | Fixed RAG pipeline | AdaptiveRAG |
|---|---|---|
| Chunking strategy | Hard-coded | Chosen per document type (sentence / paragraph / code) |
| Chunk size | Fixed | Auto-tuned against your actual documents |
| Query expansion | None | HyDE (hypothetical document embedding) for vague queries |
| Retrieval passes | Single | Multi-hop follow-up when first pass is insufficient |
| Result reranking | None | Cross-encoder reranking for analytical / comparison queries |
| Answer quality | Not checked | Critic node scores the answer; retries with a new strategy if confidence is low |
| Parameter tuning | Manual | Optimizer agent tunes chunk size, top-k, temperature, and reranking automatically |
| Privacy | Requires external API | 100% local — no data leaves your machine |
How it works
AdaptiveRAG is composed of two LangGraph state machines.
Setup graph — runs once at startup
load docs ──► profile KB ──► plan config ──► index ──► evaluate ──► orchestrate ──┐
▲ │
critique ◄── tune_*
(chunk / retrieval /
generation / reranking)
- Profile — the LLM classifies domain, structure type, and complexity of your documents
- Plan — heuristic config is derived from the profile (chunk size, strategy, top-k, temperature)
- Index — documents are chunked and embedded into ChromaDB
- Evaluate — answers are scored against validation queries using cosine similarity
- Orchestrate — the LLM picks which parameter to tune next and loops until scores plateau
Query graph — runs for every question
classify ──► strategize ──► expand ──► retrieve ──► retrieval critic ──┐
▲ │ │
│ multihop ◄──── ┘
│ │
└──── retry ◄──── reflect ◄──── generate ◄──── rerank ◄─┘
- Classify — query type detected (factual / analytical / code / comparison / summarisation)
- Strategize — LLM decides which tools to use (HyDE, rerank, multihop, top-k)
- Retrieve — vector search, optionally with expanded queries
- Critic — retrieval quality is scored; if too low, a follow-up multi-hop query is issued
- Generate — answer produced using the style matching the query type
- Reflect — answer critic checks groundedness and completeness; retries if below threshold
Installation
pip install adaptiverag
With cross-encoder reranking (recommended for analytical or comparison queries):
pip install "adaptiverag[reranker]"
Prerequisite: Ollama must be running locally. Any missing models are pulled automatically the first time
build_rag()is called — no manualollama pullrequired.
Quick start
1. Add documents
Create a knowledge_base/ folder and drop in your files (.txt, .pdf, .md, .docx):
knowledge_base/
├── report.pdf
├── notes.md
└── spec.txt
2. Run
from adaptiverag import build_rag
# Indexes knowledge_base/, auto-tunes the pipeline, returns a ready instance
rag = build_rag()
result = rag.ask("What are the main findings?")
print(result) # the answer (str(result) also works)
print(result.confidence) # 0.0 – 1.0 self-assessed confidence
print(result.strategy) # why the agent chose this retrieval path
print(result.trace) # full step-by-step reasoning log
3. CLI
adaptiverag
Interactive prompt with the same agentic graph — type trace to see the last query's reasoning.
Configuration
rag = build_rag(
llm_model = "gemma4:latest", # any Ollama chat model (auto-pulled)
embed_model = "nomic-embed-text:latest", # any Ollama embedding model (auto-pulled)
kb_path = "./knowledge_base", # path to your documents
val_queries_path = "./validation_queries.json", # optional — auto-generated from KB if omitted
)
Restrict retrieval to a single source file
# keyword prefix
result = rag.ask("from:report.pdf Summarise the methodology")
# or the parameter
result = rag.ask("Summarise the methodology", source_filter="report.pdf")
Supported Ollama models
Any model available at ollama.com/library works. Recommended:
| Role | Model |
|---|---|
| LLM (routing + answers) | gemma4, llama3.2, mistral, qwen2.5 |
| Embeddings | nomic-embed-text, mxbai-embed-large |
Supported document formats
| Format | Extension | Notes |
|---|---|---|
| Plain text | .txt |
UTF-8 |
.pdf |
Text-based; scanned PDFs not supported | |
| Markdown | .md |
Code blocks, headings, and links stripped cleanly |
| Word | .docx |
Requires python-docx (included) |
Mixed formats in the same folder are fully supported.
Validation queries
The setup graph tunes pipeline parameters by scoring generated answers against expected answers. Provide your own queries for best results:
[
{
"query": "What problem does this research solve?",
"expected_answer": "The research addresses the challenge of ..."
},
{
"query": "What method is used for data collection?",
"expected_answer": "Data was collected through ..."
}
]
Pass the path via val_queries_path. If you omit it:
- AdaptiveRAG checks for
./validation_queries.json - If not found, the LLM auto-generates queries from your documents and saves them to that path
- You can then open the file, edit or extend the queries, and they will be used on the next run
API reference
build_rag(...) → AdaptiveRAG
| Parameter | Type | Default | Description |
|---|---|---|---|
llm_model |
str |
"gemma4:latest" |
Ollama model for routing and answer generation |
embed_model |
str |
"nomic-embed-text:latest" |
Ollama model for embeddings |
kb_path |
str | None |
"./knowledge_base" |
Folder containing your documents |
val_queries_path |
str | None |
"./validation_queries.json" |
Validation Q&A file (auto-generated if missing) |
AdaptiveRAG.ask(question, source_filter=None) → QueryResult
| Parameter | Type | Description |
|---|---|---|
question |
str |
Natural-language question. Prefix with from:<file> to filter by source. |
source_filter |
str | None |
Restrict retrieval to a single filename |
QueryResult fields
| Field | Type | Description |
|---|---|---|
answer |
str |
The generated answer (str(result) also works) |
confidence |
float |
Self-assessed confidence, 0.0 – 1.0 |
retries |
int |
Number of reflection retries used |
strategy |
str |
One-line explanation of the retrieval strategy chosen |
trace |
list[str] |
Complete step-by-step decision log |
Project structure
adaptiverag/
├── core/
│ ├── config.py # constants and defaults
│ ├── models.py # KBProfile, PipelineConfig dataclasses
│ └── runtime.py # shared runtime singleton (RT)
├── components/
│ ├── chunker.py # content-aware chunking strategies
│ ├── embedder.py # Ollama embedding wrapper
│ ├── retriever.py # ChromaDB retrieval
│ └── reranker.py # cross-encoder reranking (optional)
├── pipeline/
│ ├── tools.py # LangChain tools (retrieve, rerank, HyDE, generate)
│ ├── kb_analysis.py # KB profiling and heuristic config planning
│ └── file_loader.py # document loading (.txt, .pdf, .md, .docx)
├── graphs/
│ ├── setup_graph.py # build-time LangGraph agent
│ └── query_graph.py # per-query LangGraph agent
├── api.py # public Python API (build_rag, AdaptiveRAG, QueryResult)
└── main.py # CLI entry point
Requirements
- Python ≥ 3.10
- Ollama running at
http://localhost:11434 - Dependencies installed automatically via pip:
langgraph,langchain-ollama,chromadb,pypdf,python-docx,numpy,tqdm
Contributing
Contributions are welcome. Please open an issue first to discuss what you would like to change.
git clone https://github.com/navid72m/adaptiveRAG.git
cd adaptiveRAG
pip install -e ".[dev]"
License
MIT © navid72m
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adaptiverag-1.0.2.tar.gz.
File metadata
- Download URL: adaptiverag-1.0.2.tar.gz
- Upload date:
- Size: 29.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
389bdeaf34d2548d5bbcfefd3e4d8d2887ea44a257a33d08dfca52a7113c34d1
|
|
| MD5 |
7648c7db7cf9c24f1d55d655332a8fce
|
|
| BLAKE2b-256 |
cd7d7f826893007a68864e3c88cc8d1e0dc97c485582ad1b909775b3371af65c
|
File details
Details for the file adaptiverag-1.0.2-py3-none-any.whl.
File metadata
- Download URL: adaptiverag-1.0.2-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fe8c6ac6c5bcdc3e1fadc0dcd8de82c76edc24ea5963c741fed33b064e1ca40
|
|
| MD5 |
3127535aac0e9c7313467c35a03f999a
|
|
| BLAKE2b-256 |
2104ab5fb4f2ebf3e90a2045a66968e8eddd1f10e51b97cff1d1eaab4ed850a1
|