Lightweight structural RAG library — index documents as trees, query without a vector DB.
Project description
treerag 🌳
RAG without vector databases.
A lightweight structural RAG library that indexes documents as hierarchical trees and retrieves answers using structure instead of embeddings.
Works with any LangChain-compatible LLM: OpenAI, Anthropic, Gemini, Ollama.
🚀 Why treerag?
- ❌ No vector database required
- ⚡ Fast hierarchical retrieval
- 🧠 Uses document structure instead of embeddings
- 🔌 Works with any LLM
- 🪶 Lightweight and easy to integrate
📦 Install
# Base
pip install treerag
# With OpenAI
pip install "treerag[openai]"
# With Anthropic
pip install "treerag[anthropic]"
# Everything
pip install "treerag[all]"
⚡ Quick Start
from treerag import index_document, make_summarizer, ask, make_retriever
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# 1. Index document
doc = index_document("my_doc.md", summarizer=make_summarizer(llm))
# 2. Ask question
result = ask("What does this document cover?", doc, make_retriever(llm))
print(result.content) # answer
print(result.references) # sections used
print(result.response_metadata) # token usage
🧠 How It Works
Indexing
File / URL
↓ read_file()
↓ parse_sections()
↓ make_summarizer()
↓ build_hierarchy()
↓ flatten_tree()
↓ save_registry()
Querying
Query
↓ tree search (LLM selects relevant nodes)
↓ fetch content
↓ answer generation
↓ AIMessage with references
📂 Supported Inputs
# Files
index_document("file.md")
index_document("report.pdf")
index_document("document.docx")
index_document("notes.txt")
# URLs
index_document("https://docs.example.com")
📊 Response Formats
# Default (LangChain AIMessage)
result = ask("What is this?", doc, retriever)
print(result.content)
print(result.references)
# Plain dict
result = ask("What is this?", doc, retriever, return_raw=False)
print(result["answer"])
# Streaming
for chunk in ask("What is this?", doc, retriever, stream=True):
if isinstance(chunk, dict):
print(chunk["__references__"])
else:
print(chunk, end="")
⚡ Async Support
from treerag import aask, make_async_retriever
retriever = make_async_retriever(llm)
result = await aask("What is this?", doc, retriever)
print(result.content)
📚 Multi-Document Q&A
from treerag import ask_multi, get_document_by_id
doc1 = get_document_by_id("uuid-1")
doc2 = get_document_by_id("uuid-2")
result = ask_multi("What is the budget cap?", [doc1, doc2], retriever)
print(result.content)
print(result.references)
🗂️ Registry Management
from treerag import list_documents, get_document_by_id, delete_document
for doc in list_documents():
print(doc["name"], doc["doc_id"])
doc = get_document_by_id("uuid")
delete_document("uuid")
🎯 Custom Prompts
summarizer = make_summarizer(
llm,
system_prompt="You are a legal expert. Summarize clauses and obligations."
)
retriever = make_retriever(
llm,
answer_system_prompt=(
"You are a legal assistant. "
"Answer using ONLY the provided context."
)
)
result = ask(
"What are the key obligations?",
doc,
retriever,
extra_context="This is a legal agreement."
)
🔌 Supported Providers
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_ollama import ChatOllama
# OpenAI
make_retriever(ChatOpenAI(model="gpt-4o"))
# Anthropic
make_retriever(ChatAnthropic(model="claude-haiku"))
# Gemini
make_retriever(ChatGoogleGenerativeAI(model="gemini-2.0-flash"))
# Ollama (local)
make_retriever(ChatOllama(model="llama3"))
🏗️ Production Usage
doc = index_document("file.md", persist=False)
# store externally
db.save(doc["doc_id"], doc)
# load later
doc = db.get("id")
result = ask("Your question?", doc, retriever)
📌 Example Use Cases
- 📄 Documentation Q&A
- 📚 Internal knowledge base
- 🤖 AI assistants without vector DB
- 🧾 Legal / contract analysis
⚔️ treerag vs Traditional RAG
| Feature | treerag | Vector RAG |
|---|---|---|
| Setup | Simple | Complex |
| DB required | ❌ | ✅ |
| Cost | Low | High |
| Retrieval | Structure-based | Embeddings |
📄 License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file treerag-0.1.1.tar.gz.
File metadata
- Download URL: treerag-0.1.1.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eec9a345a9757873b2634d962c1404e82684bd696c5cb2258c2b8e061702735d
|
|
| MD5 |
ae83b05f6f6cd5055809d9aef87a7cd2
|
|
| BLAKE2b-256 |
5092ae88c25a60dab360241e8b966d087fef031d4e7ef8f63e11399b4c96d3ca
|
File details
Details for the file treerag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: treerag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0503a3f57f6a718bf3cc0b90ecbb7e0242dce297c95a5877dc722590fd4e8c44
|
|
| MD5 |
117fe16df37cd285e07d3ca22af0f976
|
|
| BLAKE2b-256 |
ed6046bd4f9ff50af3d8c4d2130c93f86e9a9694087aeb8749f21d92de8e988d
|