Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.
Project description
TreeDex
Tree-based, vectorless document RAG framework.
Index any document into a navigable tree structure, then retrieve relevant sections using any LLM. No vector databases, no embeddings — just structured tree retrieval.
How It Works
- Load — Extract pages from any supported format
- Index — LLM analyzes page groups and extracts hierarchical structure
- Build — Flat sections become a tree with page ranges and embedded text
- Query — LLM selects relevant tree nodes for your question
- Return — Get context text, source pages, and reasoning
Why TreeDex instead of Vector DB?
Supported LLM Providers
TreeDex works with every major AI provider out of the box. Pick what works for you:
One-liner backends (zero config)
| Backend | Provider | Default Model | Dependencies |
|---|---|---|---|
GeminiLLM |
gemini-2.0-flash | google-generativeai |
|
OpenAILLM |
OpenAI | gpt-4o | openai |
ClaudeLLM |
Anthropic | claude-sonnet-4-20250514 | anthropic |
MistralLLM |
Mistral AI | mistral-large-latest | mistralai |
CohereLLM |
Cohere | command-r-plus | cohere |
GroqLLM |
Groq | llama-3.3-70b-versatile | None (stdlib) |
TogetherLLM |
Together AI | Llama-3-70b-chat-hf | None (stdlib) |
FireworksLLM |
Fireworks | llama-v3p1-70b-instruct | None (stdlib) |
OpenRouterLLM |
OpenRouter | claude-sonnet-4 | None (stdlib) |
DeepSeekLLM |
DeepSeek | deepseek-chat | None (stdlib) |
CerebrasLLM |
Cerebras | llama-3.3-70b | None (stdlib) |
SambanovaLLM |
SambaNova | Llama-3.1-70B-Instruct | None (stdlib) |
HuggingFaceLLM |
HuggingFace | Mistral-7B-Instruct | None (stdlib) |
OllamaLLM |
Ollama (local) | llama3 | None (stdlib) |
Universal backends
| Backend | Use case | Dependencies |
|---|---|---|
OpenAICompatibleLLM |
Any OpenAI-compatible endpoint (URL + key) | None (stdlib) |
LiteLLM |
100+ providers via litellm library | litellm |
FunctionLLM |
Wrap any callable(str) -> str |
None |
BaseLLM |
Subclass to build your own | None |
Quick Start
Install
# pip
pip install treedex
# uv (faster)
uv pip install treedex
# With optional LLM SDK
pip install treedex[gemini] # Google Gemini
pip install treedex[openai] # OpenAI
pip install treedex[claude] # Anthropic Claude
pip install treedex[mistral] # Mistral AI
pip install treedex[cohere] # Cohere
pip install treedex[litellm] # LiteLLM (100+ providers)
pip install treedex[all] # Everything
# From source
pip install git+https://github.com/mithun50/TreeDex.git
# Development
git clone https://github.com/mithun50/TreeDex.git
cd TreeDex
pip install -e ".[dev]"
Pick your LLM and go
from treedex import TreeDex
# --- Google Gemini ---
from treedex import GeminiLLM
llm = GeminiLLM(api_key="YOUR_KEY")
# --- OpenAI ---
from treedex import OpenAILLM
llm = OpenAILLM(api_key="sk-...")
# --- Claude ---
from treedex import ClaudeLLM
llm = ClaudeLLM(api_key="sk-ant-...")
# --- Groq (free, fast) ---
from treedex import GroqLLM
llm = GroqLLM(api_key="gsk_...")
# --- Together AI ---
from treedex import TogetherLLM
llm = TogetherLLM(api_key="...")
# --- DeepSeek ---
from treedex import DeepSeekLLM
llm = DeepSeekLLM(api_key="...")
# --- Fireworks ---
from treedex import FireworksLLM
llm = FireworksLLM(api_key="...")
# --- OpenRouter (access any model) ---
from treedex import OpenRouterLLM
llm = OpenRouterLLM(api_key="...", model="anthropic/claude-sonnet-4")
# --- Cerebras ---
from treedex import CerebrasLLM
llm = CerebrasLLM(api_key="...")
# --- SambaNova ---
from treedex import SambanovaLLM
llm = SambanovaLLM(api_key="...")
# --- Mistral AI ---
from treedex import MistralLLM
llm = MistralLLM(api_key="...") # pip install mistralai
# --- Cohere ---
from treedex import CohereLLM
llm = CohereLLM(api_key="...") # pip install cohere
# --- HuggingFace ---
from treedex import HuggingFaceLLM
llm = HuggingFaceLLM(api_key="hf_...", model="mistralai/Mistral-7B-Instruct-v0.3")
# --- Local Ollama ---
from treedex import OllamaLLM
llm = OllamaLLM(model="llama3")
# Index and query (same for ALL providers)
index = TreeDex.from_file("document.pdf", llm=llm)
result = index.query("What is the main argument?")
print(result.context)
print(result.pages_str) # "pages 5-8, 12-15"
Any OpenAI-compatible endpoint
from treedex import OpenAICompatibleLLM
# Works with ANY service that speaks OpenAI format
llm = OpenAICompatibleLLM(
base_url="https://your-provider.com/v1",
api_key="...",
model="model-name"
)
100+ providers via LiteLLM
from treedex import LiteLLM
# pip install litellm
llm = LiteLLM("gpt-4o") # OpenAI
llm = LiteLLM("anthropic/claude-sonnet-4-20250514") # Claude
llm = LiteLLM("groq/llama-3.3-70b-versatile") # Groq
llm = LiteLLM("together_ai/meta-llama/Llama-3-70b-chat-hf")# Together
llm = LiteLLM("bedrock/anthropic.claude-3-sonnet") # AWS Bedrock
llm = LiteLLM("vertex_ai/gemini-pro") # Google Vertex
llm = LiteLLM("azure/gpt-4o") # Azure OpenAI
Wrap any function
from treedex import FunctionLLM
# Wrap any callable(str) -> str
llm = FunctionLLM(lambda prompt: my_custom_api(prompt))
# Or a named function
def call_my_model(prompt: str) -> str:
return requests.post(url, json={"prompt": prompt}).json()["text"]
llm = FunctionLLM(call_my_model)
Build your own backend
from treedex import BaseLLM
class MyLLM(BaseLLM):
def generate(self, prompt: str) -> str:
# Your logic here — call any API, local model, etc.
return my_api_call(prompt)
llm = MyLLM()
index = TreeDex.from_file("doc.pdf", llm=llm)
Swap LLM at query time
# Build index with one LLM
index = TreeDex.from_file("doc.pdf", llm=gemini_llm)
# Query with a different one — same index, different brain
result = index.query("...", llm=groq_llm)
Supported Document Formats
| Format | Loader | Extra Dependencies |
|---|---|---|
PDFLoader |
pymupdf |
|
| TXT / MD | TextLoader |
None |
| HTML | HTMLLoader |
None (stdlib) |
| DOCX | DOCXLoader |
python-docx |
Use auto_loader(path) for automatic format detection, or pass a specific loader:
from treedex import TreeDex, TextLoader
index = TreeDex.from_file("notes.txt", llm=llm, loader=TextLoader())
API Reference
TreeDex
| Method | Description |
|---|---|
TreeDex.from_file(path, llm, ...) |
Build index from a file |
TreeDex.from_pages(pages, llm, ...) |
Build from pre-extracted pages |
TreeDex.from_tree(tree, pages, llm?) |
Create from existing tree |
index.query(question, llm?) |
Retrieve relevant sections |
index.save(path) |
Save index to JSON |
TreeDex.load(path, llm?) |
Load index from JSON |
index.show_tree() |
Print tree structure |
index.stats() |
Get index statistics |
index.find_large_sections(...) |
Find oversized nodes |
QueryResult
| Property | Type | Description |
|---|---|---|
.context |
str |
Concatenated text from relevant sections |
.node_ids |
list[str] |
IDs of selected tree nodes |
.page_ranges |
list[tuple] |
[(start, end), ...] page ranges |
.pages_str |
str |
Human-readable: "pages 5-8, 12-15" |
.reasoning |
str |
LLM's explanation for selection |
LLM Backends
| Backend | Needs SDK? | One-liner |
|---|---|---|
GeminiLLM(api_key) |
Yes | GeminiLLM("key") |
OpenAILLM(api_key) |
Yes | OpenAILLM("sk-...") |
ClaudeLLM(api_key) |
Yes | ClaudeLLM("sk-ant-...") |
MistralLLM(api_key) |
Yes | MistralLLM("key") |
CohereLLM(api_key) |
Yes | CohereLLM("key") |
GroqLLM(api_key) |
No | GroqLLM("gsk_...") |
TogetherLLM(api_key) |
No | TogetherLLM("key") |
FireworksLLM(api_key) |
No | FireworksLLM("key") |
OpenRouterLLM(api_key) |
No | OpenRouterLLM("key") |
DeepSeekLLM(api_key) |
No | DeepSeekLLM("key") |
CerebrasLLM(api_key) |
No | CerebrasLLM("key") |
SambanovaLLM(api_key) |
No | SambanovaLLM("key") |
HuggingFaceLLM(api_key) |
No | HuggingFaceLLM("hf_...") |
OllamaLLM(model) |
No | OllamaLLM("llama3") |
LiteLLM(model) |
Yes | LiteLLM("gpt-4o") |
FunctionLLM(fn) |
No | FunctionLLM(my_fn) |
OpenAICompatibleLLM(url, model) |
No | Any endpoint |
BaseLLM (subclass) |
No | Your own logic |
Benchmarks
TreeDex vs Vector DB vs Naive Chunking
Real benchmark on the same document (NCERT Electromagnetic Waves, 14 pages, 10 queries). All three methods retrieve from the same content — only the indexing and retrieval approach differs. Auto-generated by CI on every push.
TreeDex Stats
| Feature | TreeDex | Vector RAG | Naive Chunking |
|---|---|---|---|
| Page Attribution | Exact source pages | Approximate | None |
| Structure Preserved | Full tree hierarchy | None | None |
| Index Format | Human-readable JSON | Opaque vectors | Text chunks |
| Embedding Model | Not needed | Required | Not needed |
| Infrastructure | None (JSON file) | Vector DB required | None |
| Core Dependencies | 2 (pymupdf, tiktoken) | 5-8+ | 2-5 |
Run your own:
python benchmarks/run_benchmark.py --helporpython benchmarks/compare_vectordb.py --help
Architecture
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest
# With coverage
pytest --cov=treedex
# Run specific test file
pytest tests/test_core.py -v
License
MIT License — Mithun Gowda B
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file treedex-0.1.0.tar.gz.
File metadata
- Download URL: treedex-0.1.0.tar.gz
- Upload date:
- Size: 59.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cf092ceac261d6395d1af662fce1b5ce5bd88b33bbe0c041f215e0548a10c3d
|
|
| MD5 |
ce3e92b3e53582fe052fb991f0519c2b
|
|
| BLAKE2b-256 |
9a725b1c5f17e3ce8457ded3916ae290ed2b0425c7e198269e476b48a85e4bf4
|
Provenance
The following attestation bundles were made for treedex-0.1.0.tar.gz:
Publisher:
publish.yml on mithun50/TreeDex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
treedex-0.1.0.tar.gz -
Subject digest:
4cf092ceac261d6395d1af662fce1b5ce5bd88b33bbe0c041f215e0548a10c3d - Sigstore transparency entry: 1005781939
- Sigstore integration time:
-
Permalink:
mithun50/TreeDex@ecb6cf87e5da2514f9fcda9b5422d9c12880b2f5 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mithun50
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ecb6cf87e5da2514f9fcda9b5422d9c12880b2f5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file treedex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: treedex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce7a6036d0a428d28a3b798fed91b4390cbc2239ef3c37a58cca84def0b7bf4b
|
|
| MD5 |
d799a6770b782d1eed561f61072f880e
|
|
| BLAKE2b-256 |
cdbc908c08137c866e2327bb6c56f50d8443eb5a496b710ec6b54fcc3c221a3a
|
Provenance
The following attestation bundles were made for treedex-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mithun50/TreeDex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
treedex-0.1.0-py3-none-any.whl -
Subject digest:
ce7a6036d0a428d28a3b798fed91b4390cbc2239ef3c37a58cca84def0b7bf4b - Sigstore transparency entry: 1005781940
- Sigstore integration time:
-
Permalink:
mithun50/TreeDex@ecb6cf87e5da2514f9fcda9b5422d9c12880b2f5 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mithun50
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ecb6cf87e5da2514f9fcda9b5422d9c12880b2f5 -
Trigger Event:
release
-
Statement type: