Turn any PDF into searchable trees and knowledge graphs. No vectors, no chunks.

These details have not been verified by PyPI

Project links

Project description

NanoIndex

Turn any PDF into searchable trees and visual knowledge graphs. Ask questions, get answers with page citations.

Benchmark	Documents	Avg Pages	Accuracy
FinanceBench (SEC 10-K filings)	84	143	94.5%
DocBench Legal (court filings, legislation)	51	54	96.0%

No vector databases. No chunk tuning. No embeddings.

NanoIndex reads your document, understands its structure (headings, sections, tables, figures), and builds a tree you can search with plain English. Built on Nanonets OCR-3 for extraction. Fully open source.

Quick Start

1. Install

pip install nanoindex

2. Set your API keys

export NANONETS_API_KEY=your_key    # Get free at docstrange.nanonets.com/app (10K pages free)
export OPENAI_API_KEY=your_key      # Or ANTHROPIC_API_KEY, GOOGLE_API_KEY, GROQ_API_KEY

3. Go

from nanoindex import NanoIndex

ni = NanoIndex()
tree = ni.index("report.pdf")
answer = ni.ask("What was the revenue?", tree)
print(answer.content)

That's it. Keys auto-detected from env. LLM auto-selected from available keys.

What Can You Do With It

Ask questions, get cited answers

answer = ni.ask("What was Q3 gross margin?", tree)
print(answer.content)     # "Gross margin was 42.3% in Q3..."
print(answer.citations)   # [Citation(title="Income Statement", pages=[45, 46])]

From the command line

nanoindex index report.pdf -o tree.json
nanoindex ask report.pdf "What was the revenue?"
nanoindex viz tree.json

Pick your LLM

ni = NanoIndex(llm="openai:gpt-4o")
ni = NanoIndex(llm="anthropic:claude-sonnet-4-6")
ni = NanoIndex(llm="gemini:gemini-2.5-flash")
ni = NanoIndex(llm="groq:llama-3.3-70b-versatile")
ni = NanoIndex(llm="ollama:llama3")

Or just set the env var and NanoIndex picks the right one:

export ANTHROPIC_API_KEY=...   # NanoIndex uses Claude automatically

Save and reuse trees

from nanoindex.utils.tree_ops import save_tree, load_tree

save_tree(tree, "my_tree.json")
tree = load_tree("my_tree.json")  # instant, no API call

Search across multiple documents

from nanoindex import DocumentStore

store = DocumentStore()
for pdf in ["q1.pdf", "q2.pdf", "q3.pdf"]:
    store.add(ni.index(pdf))

answer = ni.multi_ask("Compare revenue across quarters", store)

Build a Knowledge Base

from nanoindex import KnowledgeBase

kb = KnowledgeBase("./my-research")
kb.add("report1.pdf")
kb.add("report2.pdf")

answer = kb.ask("How do these compare?")  # answers filed back into wiki

Open the my-research/ folder in Obsidian to browse the compiled wiki with [[backlinks]].

How It Works

Indexing: PDF to tree + graph

Ingestion Pipeline

Querying: Agentic Mode (default)

Query Pipeline - Agentic Mode

Querying: Fast Mode (graph-based, cheaper)

Query Pipeline - Fast Mode

Query Modes

Mode	How it works	Best for
`agentic_vision` (default)	LLM navigates full tree + reads page images	Highest accuracy
`agentic`	Same without images	Text-heavy docs
`fast`	Graph entity lookup, LLM sees ~20 nodes	Cheapest, fastest
`fast_vision`	Same + page images	Charts and figures

# Default — agentic with vision
answer = ni.ask("What was revenue?", tree, pdf_path="report.pdf")

# Fast mode — graph-based, 3x cheaper
answer = ni.ask("What was revenue?", tree, mode="fast")

Entity Graph

Every indexed document gets a knowledge graph built automatically using spaCy NLP (free, local, no API calls). Entities (companies, people, dates, money) and relationships are extracted from the tree.

If a reasoning LLM is configured, graph quality is enhanced with LLM-extracted entities on top of spaCy.

The graph powers:

Fast mode retrieval (entity keyword match + relationship expansion)
Knowledge Base concept articles
The Entities tab in the visualization dashboard

Bounding Boxes and Citations

Every answer includes citations with exact bounding box coordinates:

for citation in answer.citations:
    print(f"Section: {citation.title}, Pages: {citation.pages}")
    for bb in citation.bounding_boxes:
        print(f"  Page {bb.page}: ({bb.x:.2f}, {bb.y:.2f}) — {bb.text}")

The citation resolver matches answer text back to specific regions on the PDF page, so you can highlight exactly where the answer came from.

Open-Source Mode (No API Key for Parsing)

ni = NanoIndex(parser="pymupdf")
tree = ni.index("report.pdf")  # no API key needed

PyMuPDF gives basic text and table extraction. The tree will be simpler (no heading detection), but works for quick experiments. For production, use Nanonets OCR-3.

Benchmarks

Benchmark	Documents	Avg Pages	Accuracy
FinanceBench (SEC 10-K filings)	84	143	94.5%
DocBench Legal (court filings, legislation)	51	54	96.0%

Evidence page retrieval: 93.3%

FinanceBench Architecture

How It Compares

	Traditional RAG	NanoIndex
Indexing	Chunk + embed + vector DB	Extract + build tree
Retrieval	Similarity search	LLM reasons over structure
Tables	Poorly handled	Natively extracted
Figures	Not supported	Vision mode
Scanned docs	Needs separate OCR	Built-in
Structure-aware	No	Yes
Citations	Approximate	Exact page + bounding box

Development

git clone https://github.com/nanonets/nanoindex.git
cd nanoindex
pip install -e ".[dev]"
pytest

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Apr 6, 2026

0.3.0

Apr 5, 2026

0.2.0

Apr 5, 2026

0.1.26

Apr 5, 2026

This version

0.1.25

Apr 5, 2026

0.1.24

Apr 5, 2026

0.1.23

Apr 5, 2026

0.1.22

Apr 5, 2026

0.1.21

Apr 5, 2026

0.1.20

Apr 5, 2026

0.1.19

Apr 5, 2026

0.1.18

Apr 5, 2026

0.1.17

Apr 5, 2026

0.1.16

Apr 5, 2026

0.1.15

Apr 5, 2026

0.1.14

Apr 5, 2026

0.1.13

Apr 5, 2026

0.1.12

Apr 5, 2026

0.1.11

Apr 5, 2026

0.1.10

Apr 5, 2026

0.1.9

Apr 5, 2026

0.1.8

Apr 5, 2026

0.1.7

Apr 5, 2026

0.1.6

Apr 5, 2026

0.1.5

Apr 5, 2026

0.1.4

Apr 5, 2026

0.1.3

Apr 5, 2026

0.1.2

Apr 3, 2026

0.1.1

Apr 3, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanoindex-0.1.25.tar.gz (121.5 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nanoindex-0.1.25-py3-none-any.whl (124.0 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file nanoindex-0.1.25.tar.gz.

File metadata

Download URL: nanoindex-0.1.25.tar.gz
Upload date: Apr 5, 2026
Size: 121.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for nanoindex-0.1.25.tar.gz
Algorithm	Hash digest
SHA256	`83a2289405a986ed0c58a72c5275a33babd3f565ea889ba24e9254092fe66a2e`
MD5	`f52bf795333cea26e9bae1bf58d15202`
BLAKE2b-256	`f3d0ffa2cdfee13d700a2716ee8c01bf9b00da6fd7c6537b562c743c5fac7307`

See more details on using hashes here.

File details

Details for the file nanoindex-0.1.25-py3-none-any.whl.

File metadata

Download URL: nanoindex-0.1.25-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 124.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for nanoindex-0.1.25-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7e031df02f59214d35c7bb4662d6ac3328b3dd0d0185e6e2cbb89eaebb0ce1c`
MD5	`71c2252d029eca4b5d1381afe8678a83`
BLAKE2b-256	`00de0c1f196996f74c24ba476ae1e56d10d43ac4a98f77b0be8ae1a6665a6431`

See more details on using hashes here.

nanoindex 0.1.25

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NanoIndex

Quick Start

1. Install

2. Set your API keys

3. Go

What Can You Do With It

Ask questions, get cited answers

From the command line

Pick your LLM

Save and reuse trees

Search across multiple documents

Build a Knowledge Base

How It Works

Indexing: PDF to tree + graph

Querying: Agentic Mode (default)

Querying: Fast Mode (graph-based, cheaper)

Query Modes

Entity Graph

Bounding Boxes and Citations

Open-Source Mode (No API Key for Parsing)

Benchmarks

How It Compares

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes