Skip to main content

Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.

Project description

TreeDex

Tree-based, vectorless document RAG framework.

Index any document into a navigable tree structure, then retrieve relevant sections using any LLM. No vector databases, no embeddings — just structured tree retrieval.

Open In Colab PyPI License: MIT Python 3.10+


How It Works

How TreeDex Works

  1. Load — Extract pages from any supported format
  2. Index — LLM analyzes page groups and extracts hierarchical structure
  3. Build — Flat sections become a tree with page ranges and embedded text
  4. Query — LLM selects relevant tree nodes for your question
  5. Return — Get context text, source pages, and reasoning

Why TreeDex instead of Vector DB?

TreeDex vs Vector DB


Supported LLM Providers

LLM Providers

TreeDex works with every major AI provider out of the box. Pick what works for you:

One-liner backends (zero config)

Backend Provider Default Model Dependencies
GeminiLLM Google gemini-2.0-flash google-generativeai
OpenAILLM OpenAI gpt-4o openai
ClaudeLLM Anthropic claude-sonnet-4-20250514 anthropic
MistralLLM Mistral AI mistral-large-latest mistralai
CohereLLM Cohere command-r-plus cohere
GroqLLM Groq llama-3.3-70b-versatile None (stdlib)
TogetherLLM Together AI Llama-3-70b-chat-hf None (stdlib)
FireworksLLM Fireworks llama-v3p1-70b-instruct None (stdlib)
OpenRouterLLM OpenRouter claude-sonnet-4 None (stdlib)
DeepSeekLLM DeepSeek deepseek-chat None (stdlib)
CerebrasLLM Cerebras llama-3.3-70b None (stdlib)
SambanovaLLM SambaNova Llama-3.1-70B-Instruct None (stdlib)
HuggingFaceLLM HuggingFace Mistral-7B-Instruct None (stdlib)
OllamaLLM Ollama (local) llama3 None (stdlib)

Universal backends

Backend Use case Dependencies
OpenAICompatibleLLM Any OpenAI-compatible endpoint (URL + key) None (stdlib)
LiteLLM 100+ providers via litellm library litellm
FunctionLLM Wrap any callable(str) -> str None
BaseLLM Subclass to build your own None

Quick Start

Install

# pip
pip install treedex

# uv (faster)
uv pip install treedex

# With optional LLM SDK
pip install treedex[gemini]      # Google Gemini
pip install treedex[openai]      # OpenAI
pip install treedex[claude]      # Anthropic Claude
pip install treedex[mistral]     # Mistral AI
pip install treedex[cohere]      # Cohere
pip install treedex[litellm]     # LiteLLM (100+ providers)
pip install treedex[all]         # Everything

# From source
pip install git+https://github.com/mithun50/TreeDex.git

# Development
git clone https://github.com/mithun50/TreeDex.git
cd TreeDex
pip install -e ".[dev]"

Pick your LLM and go

from treedex import TreeDex

# --- Google Gemini ---
from treedex import GeminiLLM
llm = GeminiLLM(api_key="YOUR_KEY")

# --- OpenAI ---
from treedex import OpenAILLM
llm = OpenAILLM(api_key="sk-...")

# --- Claude ---
from treedex import ClaudeLLM
llm = ClaudeLLM(api_key="sk-ant-...")

# --- Groq (free, fast) ---
from treedex import GroqLLM
llm = GroqLLM(api_key="gsk_...")

# --- Together AI ---
from treedex import TogetherLLM
llm = TogetherLLM(api_key="...")

# --- DeepSeek ---
from treedex import DeepSeekLLM
llm = DeepSeekLLM(api_key="...")

# --- Fireworks ---
from treedex import FireworksLLM
llm = FireworksLLM(api_key="...")

# --- OpenRouter (access any model) ---
from treedex import OpenRouterLLM
llm = OpenRouterLLM(api_key="...", model="anthropic/claude-sonnet-4")

# --- Cerebras ---
from treedex import CerebrasLLM
llm = CerebrasLLM(api_key="...")

# --- SambaNova ---
from treedex import SambanovaLLM
llm = SambanovaLLM(api_key="...")

# --- Mistral AI ---
from treedex import MistralLLM
llm = MistralLLM(api_key="...")  # pip install mistralai

# --- Cohere ---
from treedex import CohereLLM
llm = CohereLLM(api_key="...")  # pip install cohere

# --- HuggingFace ---
from treedex import HuggingFaceLLM
llm = HuggingFaceLLM(api_key="hf_...", model="mistralai/Mistral-7B-Instruct-v0.3")

# --- Local Ollama ---
from treedex import OllamaLLM
llm = OllamaLLM(model="llama3")

# Index and query (same for ALL providers)
index = TreeDex.from_file("document.pdf", llm=llm)
result = index.query("What is the main argument?")
print(result.context)
print(result.pages_str)  # "pages 5-8, 12-15"

Any OpenAI-compatible endpoint

from treedex import OpenAICompatibleLLM

# Works with ANY service that speaks OpenAI format
llm = OpenAICompatibleLLM(
    base_url="https://your-provider.com/v1",
    api_key="...",
    model="model-name"
)

100+ providers via LiteLLM

from treedex import LiteLLM

# pip install litellm
llm = LiteLLM("gpt-4o")                                    # OpenAI
llm = LiteLLM("anthropic/claude-sonnet-4-20250514")         # Claude
llm = LiteLLM("groq/llama-3.3-70b-versatile")              # Groq
llm = LiteLLM("together_ai/meta-llama/Llama-3-70b-chat-hf")# Together
llm = LiteLLM("bedrock/anthropic.claude-3-sonnet")          # AWS Bedrock
llm = LiteLLM("vertex_ai/gemini-pro")                       # Google Vertex
llm = LiteLLM("azure/gpt-4o")                               # Azure OpenAI

Wrap any function

from treedex import FunctionLLM

# Wrap any callable(str) -> str
llm = FunctionLLM(lambda prompt: my_custom_api(prompt))

# Or a named function
def call_my_model(prompt: str) -> str:
    return requests.post(url, json={"prompt": prompt}).json()["text"]

llm = FunctionLLM(call_my_model)

Build your own backend

from treedex import BaseLLM

class MyLLM(BaseLLM):
    def generate(self, prompt: str) -> str:
        # Your logic here — call any API, local model, etc.
        return my_api_call(prompt)

llm = MyLLM()
index = TreeDex.from_file("doc.pdf", llm=llm)

Swap LLM at query time

# Build index with one LLM
index = TreeDex.from_file("doc.pdf", llm=gemini_llm)

# Query with a different one — same index, different brain
result = index.query("...", llm=groq_llm)

Supported Document Formats

Format Loader Extra Dependencies
PDF PDFLoader pymupdf
TXT / MD TextLoader None
HTML HTMLLoader None (stdlib)
DOCX DOCXLoader python-docx

Use auto_loader(path) for automatic format detection, or pass a specific loader:

from treedex import TreeDex, TextLoader

index = TreeDex.from_file("notes.txt", llm=llm, loader=TextLoader())

API Reference

TreeDex

Method Description
TreeDex.from_file(path, llm, ...) Build index from a file
TreeDex.from_pages(pages, llm, ...) Build from pre-extracted pages
TreeDex.from_tree(tree, pages, llm?) Create from existing tree
index.query(question, llm?) Retrieve relevant sections
index.save(path) Save index to JSON
TreeDex.load(path, llm?) Load index from JSON
index.show_tree() Print tree structure
index.stats() Get index statistics
index.find_large_sections(...) Find oversized nodes

QueryResult

Property Type Description
.context str Concatenated text from relevant sections
.node_ids list[str] IDs of selected tree nodes
.page_ranges list[tuple] [(start, end), ...] page ranges
.pages_str str Human-readable: "pages 5-8, 12-15"
.reasoning str LLM's explanation for selection

LLM Backends

Backend Needs SDK? One-liner
GeminiLLM(api_key) Yes GeminiLLM("key")
OpenAILLM(api_key) Yes OpenAILLM("sk-...")
ClaudeLLM(api_key) Yes ClaudeLLM("sk-ant-...")
MistralLLM(api_key) Yes MistralLLM("key")
CohereLLM(api_key) Yes CohereLLM("key")
GroqLLM(api_key) No GroqLLM("gsk_...")
TogetherLLM(api_key) No TogetherLLM("key")
FireworksLLM(api_key) No FireworksLLM("key")
OpenRouterLLM(api_key) No OpenRouterLLM("key")
DeepSeekLLM(api_key) No DeepSeekLLM("key")
CerebrasLLM(api_key) No CerebrasLLM("key")
SambanovaLLM(api_key) No SambanovaLLM("key")
HuggingFaceLLM(api_key) No HuggingFaceLLM("hf_...")
OllamaLLM(model) No OllamaLLM("llama3")
LiteLLM(model) Yes LiteLLM("gpt-4o")
FunctionLLM(fn) No FunctionLLM(my_fn)
OpenAICompatibleLLM(url, model) No Any endpoint
BaseLLM (subclass) No Your own logic

Benchmarks

TreeDex vs Vector DB vs Naive Chunking

Comparison Benchmark

Real benchmark on the same document (NCERT Electromagnetic Waves, 14 pages, 10 queries). All three methods retrieve from the same content — only the indexing and retrieval approach differs. Auto-generated by CI on every push.

TreeDex Stats

Benchmarks

Feature TreeDex Vector RAG Naive Chunking
Page Attribution Exact source pages Approximate None
Structure Preserved Full tree hierarchy None None
Index Format Human-readable JSON Opaque vectors Text chunks
Embedding Model Not needed Required Not needed
Infrastructure None (JSON file) Vector DB required None
Core Dependencies 2 (pymupdf, tiktoken) 5-8+ 2-5

Run your own: python benchmarks/run_benchmark.py --help or python benchmarks/compare_vectordb.py --help


Architecture

Architecture

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# With coverage
pytest --cov=treedex

# Run specific test file
pytest tests/test_core.py -v

License

MIT License — Mithun Gowda B

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treedex-0.1.0.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

treedex-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file treedex-0.1.0.tar.gz.

File metadata

  • Download URL: treedex-0.1.0.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for treedex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4cf092ceac261d6395d1af662fce1b5ce5bd88b33bbe0c041f215e0548a10c3d
MD5 ce3e92b3e53582fe052fb991f0519c2b
BLAKE2b-256 9a725b1c5f17e3ce8457ded3916ae290ed2b0425c7e198269e476b48a85e4bf4

See more details on using hashes here.

Provenance

The following attestation bundles were made for treedex-0.1.0.tar.gz:

Publisher: publish.yml on mithun50/TreeDex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file treedex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: treedex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for treedex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce7a6036d0a428d28a3b798fed91b4390cbc2239ef3c37a58cca84def0b7bf4b
MD5 d799a6770b782d1eed561f61072f880e
BLAKE2b-256 cdbc908c08137c866e2327bb6c56f50d8443eb5a496b710ec6b54fcc3c221a3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for treedex-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mithun50/TreeDex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page