A lightweight, completely local, zero-API, tree-based RAG framework

Project description

🗺️ NaviDoc

NaviDoc is a lightweight, completely local, zero-API, tree-based RAG framework designed to navigate document structures intelligently. Instead of blindly chopping your files into vector chunks, NaviDoc maps your documents into a logical structural tree hierarchy and uses local LLMs to precisely steer and navigate to answers.

🔗 Links:

PyPI: https://pypi.org/project/navidoc/
GitHub: https://github.com/Bishwajitgarai/navidoc

✨ Features

🔒 100% Private & Offline: Your documents never leave your machine. Zero cloud APIs, zero telemetry.
🌳 Tree-Based Navigation: Mimics human navigation by following document structures (headers, font sizes) instead of standard proximity vector chunks.
⚡ High Precision: Pinpoints specific structural sections, avoiding context contamination or context blowouts.
📄 Multi-Format Support: Supports Markdown, PDF (with font-size analysis), DOCX (with style detection), PPTX, and Images (PNG, JPG) via GLM-OCR!
💾 Index Persistence: Save your indexed tree structures to JSON and reload them instantly.
💬 Persistent Chat SDK: Maintain conversation history with your documents SDK-style, backed by a persistent SQLite database!

🚀 Getting Started

1. Prerequisites

NaviDoc requires Ollama to host your local LLM engine.

Download and install Ollama from ollama.com.
Ensure the Ollama service is running in the background. NaviDoc will automatically pull the required model (defaults to phi3) on your first run!

2. Installation

Install NaviDoc via pip:

pip install navidoc

Or using uv:

uv add navidoc

💡 Usage Examples

🔍 One-off Query

from navidoc import NaviDoc

# Initialize (defaults to phi3 or NAVIDOC_MODEL_NAME env var)
engine = NaviDoc()

# Ingest and structurally index any local document
status = engine.ingest("your_document.pdf")
print(status)

# Query your document offline
response = engine.query("What are the exact system requirements?")
print(response)

💬 Multi-turn Chat (SDK Style)

from navidoc import NaviDoc

engine = NaviDoc()
engine.ingest("manual.docx")

# First turn
print(engine.chat("How do I install the battery?"))

# Second turn (remembers context and history!)
print(engine.chat("Where can I buy a replacement?"))

# Clear history if needed
engine.clear_history()

💾 Save & Fast Load Index

Avoid re-parsing large documents by saving the tree index.

from navidoc import NaviDoc

engine = NaviDoc()

# First time: Parse and Save
engine.ingest("massive_report.pdf")
engine.save_index("storage/indices/massive_report.json")

# Second time: Instant Load in milliseconds
engine.load_index("storage/indices/massive_report.json")
response = engine.query("What is the revenue?")

⚙️ Configuration

Environment Variables

You can configure NaviDoc without changing your code by setting environment variables:

NAVIDOC_MODEL_NAME: Set the default Ollama model to use (Default: phi3).

How to change it:

Windows (PowerShell): $env:NAVIDOC_MODEL_NAME="llama3"
Linux/Mac: export NAVIDOC_MODEL_NAME="llama3"

⌨️ CLI Usage

NaviDoc comes with a powerful CLI that acts as a helper for your local environment and Ollama:

Install Ollama: navidoc install-ollama (Auto-downloads and installs for your OS)
Run Models: navidoc run <model> (e.g., navidoc run phi3)
Pull Models: navidoc pull <model>
List Models: navidoc list
Forward Commands: navidoc ollama <args> (Forward any command directly to Ollama)

🧠 How Vectorless RAG Works

Traditional RAG (Retrieval-Augmented Generation) converts your documents into flat text chunks, turns them into math vectors (embeddings), and searches for chunks that look similar to your query.

NaviDoc takes a different approach:

Structure Extraction: It reads your document and builds a logical tree of headers and content (e.g., Chapter 1 -> Section 1.1 -> Content).
Tree Navigation: When you ask a question, NaviDoc asks the local LLM to look at the top-level headers and choose the most relevant one. It then drills down the tree until it finds the exact content block.
No Context Blowout: By only feeding the relevant branch to the LLM, we avoid hitting context limits and prevent the model from getting confused by irrelevant text in other chapters.

📊 Vector RAG vs NaviDoc (Tree-Based RAG)

Feature	Traditional Vector RAG	NaviDoc (Tree-Based)
Data Processing	Chops text into arbitrary, blind chunks	Parses document into a logical tree hierarchy
Embeddings	Required (needs a separate embedding model)	None (zero embeddings required)
Database	Requires a heavy Vector Database	None (uses simple JSON or SQLite)
Retrieval Method	Math similarity (can pull irrelevant context)	Reasoning (asks LLM to navigate the tree)
Context Preserved	Low (chunks lose their surrounding context)	High (always knows which section it belongs to)
Context Blowout	High (often pulls too much noise)	Low (pinpoints exact sections)

🤝 Contributing & Public Project

NaviDoc is an open-source public project and we welcome contributions from the global community!

If you want to help make local, private RAG better, please:

Star the repository on GitHub.
Open issues for bugs or feature requests.
Submit Pull Requests to add support for more formats or improve the tree navigation logic.

Let's build the best local RAG tool together!

📜 License

NaviDoc is open-source software distributed completely free under the MIT License.

Project details

Release history Release notifications | RSS feed

0.1.9

May 14, 2026

0.1.8

May 14, 2026

0.1.7

May 14, 2026

0.1.6

May 14, 2026

0.1.5

May 14, 2026

0.1.4

May 14, 2026

0.1.3

May 14, 2026

This version

0.1.2

May 14, 2026

0.1.1

May 14, 2026

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

navidoc-0.1.2.tar.gz (14.5 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

navidoc-0.1.2-py3-none-any.whl (14.3 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file navidoc-0.1.2.tar.gz.

File metadata

Download URL: navidoc-0.1.2.tar.gz
Upload date: May 14, 2026
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.0

File hashes

Hashes for navidoc-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`897cee5b1a97ccb889146d04516cbc2f467a68b9526313c906c3feddaf9d8b01`
MD5	`175d758ba07e3a089239fd03d5377942`
BLAKE2b-256	`aabbe4be079c4711f81a336cb71956236ff840d23302b289e81b4df769fad166`

See more details on using hashes here.

File details

Details for the file navidoc-0.1.2-py3-none-any.whl.

File metadata

Download URL: navidoc-0.1.2-py3-none-any.whl
Upload date: May 14, 2026
Size: 14.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.0

File hashes

Hashes for navidoc-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37e7612761336377bbf41f4a7efd440b2463b108709a90bc4a45050ce150ebaf`
MD5	`4203c7210534dd6435cfdf2c57f533b3`
BLAKE2b-256	`67f362e902f80b9061deb6e6924c0d308d78aa849a87c3869c9edae69bf25b51`

See more details on using hashes here.

navidoc 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🗺️ NaviDoc

✨ Features

🚀 Getting Started

1. Prerequisites

2. Installation

💡 Usage Examples

🔍 One-off Query

💬 Multi-turn Chat (SDK Style)

💾 Save & Fast Load Index

⚙️ Configuration

Environment Variables

⌨️ CLI Usage

🧠 How Vectorless RAG Works

📊 Vector RAG vs NaviDoc (Tree-Based RAG)

🤝 Contributing & Public Project

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes