Document intelligence API — no vectors, no embeddings, just LLM reasoning over document structure.

These details have not been verified by PyPI

Project links

Project description

VectorlessRAG

No vectors. No embeddings. No chunking. Just intelligence.

 ██╗   ██╗███████╗ ██████╗████████╗ ██████╗ ██████╗ ██╗     ███████╗███████╗███████╗
 ██║   ██║██╔════╝██╔════╝╚══██╔══╝██╔═══██╗██╔══██╗██║     ██╔════╝██╔════╝██╔════╝
 ██║   ██║█████╗  ██║        ██║   ██║   ██║██████╔╝██║     █████╗  ███████╗███████╗
 ╚██╗ ██╔╝██╔══╝  ██║        ██║   ██║   ██║██╔══██╗██║     ██╔══╝  ╚════██║╚════██║
  ╚████╔╝ ███████╗╚██████╗   ██║   ╚██████╔╝██║  ██║███████╗███████╗███████║███████║
   ╚═══╝  ╚══════╝ ╚═════╝   ╚═╝    ╚═════╝ ╚═╝  ╚═╝╚══════╝╚══════╝╚══════╝╚══════╝
                                    RAG

The problem with RAG today

You split your 200-page annual report into chunks. You embed them. You search by similarity. You get back 5 random paragraphs that kinda match the question. Your LLM hallucinates the rest.

Chunking destroys context. Embeddings find similar words, not answers. Vector search doesn't understand documents — it just pattern matches.

This problem gets worse with structured documents — financial statements, legal contracts, technical specifications — where every row in a table depends on the rows around it. Split them into chunks and you lose the table entirely.

How VectorlessRAG works

Instead of chunking, VectorlessRAG reads your document like a human analyst would:

Step 1 — Parse Every page is extracted as raw text using pdfplumber (digital PDFs) or vision mode using PyMuPDF + an LLM (scanned/image PDFs). No chunking. Pages are stored in order.

Step 2 — Index An LLM reads the document and builds a hierarchical JSON tree — like a smart table of contents with factual summaries at every level. Each node knows its exact page range.

Microsoft Annual Report (pages 1-220)
├── Financial Statements (pages 34-45)
│   ├── INCOME STATEMENTS (pages 36-37)       ← pages 36-37 stored
│   ├── BALANCE SHEETS (pages 38-39)           ← pages 38-39 stored
│   └── CASH FLOW STATEMENTS (pages 40-41)    ← pages 40-41 stored
├── Notes to Financial Statements (pages 46-130)
│   ├── Note 5 — Income Taxes (pages 58-60)
│   └── Note 16 — Other Income (pages 88-89)
└── ...

Large documents (>10 pages) are indexed in parallel batches using ThreadPoolExecutor. Each batch is given its absolute page range so the LLM never uses printed footer page numbers.

Step 3 — Retrieve At query time, the tree is flattened to a compact JSON list of {node_id, title, pages, summary}. An LLM reads this list and returns the node IDs most likely to contain the answer. The actual pages for those nodes are fetched with a ±3 page buffer to handle slight indexing offsets.

Step 4 — Answer The retrieved page text is sent to the LLM with the full query. The LLM reasons over real document content — not chunks, not embeddings, not summaries. It can compute derived metrics, compare across sections, and produce structured output.

What it can actually do

These are real queries run against the Microsoft FY2025 Annual Report (220 pages):

Fetch a complete financial statement

Query: "fetch the income statement"

Response: Full table with Product/Service revenue breakdown, COGS, Gross Margin,
R&D, S&M, G&A, Operating Income, Other Income, Tax, Net Income, EPS (basic + diluted),
weighted average shares — all three years.

Compute derived financial metrics

Query: "what is the net profit margin and tell me the trend"

Response: Calculated NPM for FY2023 (34.14%), FY2024 (35.96%), FY2025 (36.14%)
with trend interpretation.

Multi-statement analysis (DuPont)

Query: "is it a good company to invest? do dupont analysis"

Response: Full DuPont decomposition pulling from both income statement and balance sheet:
NPM × Asset Turnover × Equity Multiplier = ROE (29.65%), plus investment thesis
with risks, shareholder return analysis, and ESG commentary.

Build a financial model

Query: "build a 3-year DCF model with 10% discount rate and 3% terminal growth rate"

Response: Full projected FCF model with historical anchors, revenue growth assumptions,
CapEx projections, discounted cash flows, terminal value, and enterprise value estimate.

Compute a metric not printed in any statement

Query: "compute the free cash flow"

Response: OCF ($136,162M) − CapEx ($64,551M) = FCF $71,611M — correctly sourced
from the Cash Flow Statement.

Quickstart

1. Install

pip install vectorlessrag

Or from source:

git clone https://github.com/akhilajithnair4/vectorlessrag.git
cd vectorlessrag
pip install -r requirements.txt

2. Configure your LLM

# .env
LLM_PROVIDER=gemini
GEMINI_API_KEY=your-key-here

3. Use as a Python library

from vectorlessrag import VectorLessRag
from llms.gemini_llm import GeminiLLM

llm = GeminiLLM(api_key="your-key")
rag = VectorLessRag(llm=llm)

# Index a document (runs in background)
job_id = rag.add_document("microsoft_10k.pdf", topic_name="annual_reports", mode="text")

# Poll until indexed
status = rag.get_status(job_id)  # "processing" → "indexed"

# Query
answer = rag.query("What was the net profit margin trend?", topic_name="annual_reports")
print(answer)

4. Or run as a REST API

uvicorn api.api:app --host 0.0.0.0 --port 8000

Interactive docs available at http://localhost:8000/docs

REST API Reference

POST `/add_document/`

Upload a PDF and start indexing in the background.

Parameters (multipart form):

Field	Type	Description
`file`	File	PDF file to index
`topic_name`	string	Collection to add the document to
`mode`	string	`text` (default) or `vision` for scanned PDFs

Response:

{
  "job_id": "0248547e-a69e-46d6-987f-8612051957f6",
  "status": "processing"
}

GET `/status/{job_id}`

Check indexing progress.

Response:

{
  "job_id": "0248547e-...",
  "status": "indexed",
  "doc_id": "temp_2025_AnnualReport_0541c6"
}

Possible statuses: processing → parsed → indexed | error

POST `/query/`

Query your indexed documents.

Parameters (query string):

Field	Type	Description
`topic_name`	string	Topic to search within
`query`	string	Natural language question

Response:

{
  "response": "The net profit margin for FY2025 was 36.14%..."
}

The system retrieves the relevant pages and synthesizes a complete answer. If the query asks for a table or statement, it reproduces all rows exactly.

GET `/topics/`

List all available topics.

GET `/topics/{topic_name}/documents`

List all documents indexed under a topic.

GET `/llms/`

Show the currently configured LLM provider.

Multi-document topics

Upload multiple documents to the same topic and query across all of them simultaneously. The retriever loops over every document in the topic and merges results before synthesis.

# Upload three annual reports to one topic
curl -X POST "http://localhost:8000/add_document/" \
  -F "file=@microsoft_2025.pdf" -F "topic_name=big_tech"

curl -X POST "http://localhost:8000/add_document/" \
  -F "file=@apple_2025.pdf" -F "topic_name=big_tech"

curl -X POST "http://localhost:8000/add_document/" \
  -F "file=@google_2025.pdf" -F "topic_name=big_tech"

# Query across all three
curl -X POST "http://localhost:8000/query/?topic_name=big_tech&query=compare+net+profit+margins+and+recommend+the+best+investment"

Supported LLMs

Set LLM_PROVIDER in .env and provide the corresponding API key. No code changes needed to switch providers.

Provider	`LLM_PROVIDER`	Key variable
OpenAI (GPT-4o)	`openai`	`OPENAI_API_KEY`
Google Gemini	`gemini`	`GEMINI_API_KEY`
Anthropic Claude	`claude`	`ANTHROPIC_API_KEY`
Ollama (local)	`ollama`	—

LLM_PROVIDER=gemini
GEMINI_API_KEY=AIza...

Vision mode (for scanned PDFs) requires a multimodal LLM. Gemini 1.5 Pro and GPT-4o both work. Ollama support depends on the local model.

Parsing modes

Mode	How it works	When to use
`text`	`pdfplumber` extracts text layer	Digital/native PDFs (reports, docs)
`vision`	`PyMuPDF` renders each page to PNG, LLM reads the image	Scanned PDFs, image-based documents

# Text mode (fast, cheap)
rag.add_document("report.pdf", topic_name="docs", mode="text")

# Vision mode (slower, handles scans)
rag.add_document("scanned_contract.pdf", topic_name="legal", mode="vision")

Docker

docker build -t vectorlessrag .
docker run -p 8000:8000 --env-file .env vectorlessrag

Project structure

vectorlessindex/
├── vectorlessrag.py          # Main Python library entry point
├── api/
│   └── api.py                # FastAPI app — all endpoints
├── parsers/
│   └── pdf_parser.py         # Text + vision PDF parsing
├── indexer/
│   └── indexer.py            # Builds hierarchical JSON tree index
├── retrievers/
│   └── retriever.py          # Flattens tree, selects nodes, fetches pages
├── storage/
│   └── storage.py            # File-based JSON storage per topic
├── llms/
│   ├── base.py               # BaseLLM abstract class
│   ├── gemini_llm.py
│   ├── openai_llm.py
│   ├── claude_llm.py
│   └── ollama_llm.py
├── prompts/
│   ├── indexer_prompt.py     # Instructs LLM to build the tree
│   └── retriever_prompt.py   # Instructs LLM to select relevant nodes
└── core/
    └── config.py             # LLM provider config from .env

How the tree index works

Each document is stored as a nested JSON tree. Every node contains:

{
  "node_id": "0036",
  "title": "INCOME STATEMENTS",
  "start_index": 36,
  "end_index": 37,
  "summary": "Consolidated income statement for FY2025/24/23. Revenue $281.7B/$245.1B/$211.9B. Net income $101.8B/$88.1B/$72.4B. Diluted EPS $13.64/$11.80/$9.68.",
  "sub_nodes": []
}

Parent nodes use prefix_summary (broad overview). Leaf nodes use summary (specific facts, numbers, names). The retriever sends only leaf summaries to the selection LLM — keeping the prompt small while preserving enough detail to pick the right section.

For large documents, indexing runs in parallel batches. Each batch is given its absolute page range in the prompt so the LLM never mistakes printed footer page numbers (e.g. "Page 70" in a footer) for actual document position. This was a critical fix for annual reports and 10-Ks where internal page numbering differs from PDF page position.

Why not just use Claude/ChatGPT file upload?

	Claude / ChatGPT Upload	VectorlessRAG
Document size	~50 pages reliably	Unlimited (batched indexing)
Persistence	Gone when session ends	Stored, queryable forever
Multi-document	One at a time	Multiple per topic
API access	Manual UI only	REST API — any app can call it
Cost per query	Full doc reprocessed every time	Index once, cheap queries
Cross-doc analysis	Manual, session-scoped	Native
Programmatic	No	Yes

Claude is a tool. VectorlessRAG is infrastructure — a queryable document intelligence backend that any application can integrate.

Adding a new LLM provider

Create a file in llms/ that extends BaseLLM:

from llms.base import BaseLLM

class MyLLM(BaseLLM):
    def call(self, prompt: str) -> str:
        # your LLM call here
        return response_text

    def call_vision(self, prompt: str, image_bytes: bytes) -> str:
        # optional — only needed for vision mode
        return response_text

Then add it to core/config.py under get_llm(). No other changes needed.

Contributing

Open source. PRs welcome.

Found a bug? Open an issue.
New LLM provider? Add a file to llms/.
New parser? Add a file to parsers/.

Built with the belief that documents deserve better than chunking.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 13, 2026

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorlessrag-0.1.1.tar.gz (26.2 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectorlessrag-0.1.1-py3-none-any.whl (25.1 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file vectorlessrag-0.1.1.tar.gz.

File metadata

Download URL: vectorlessrag-0.1.1.tar.gz
Upload date: Apr 13, 2026
Size: 26.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vectorlessrag-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`028365775cdd5f2f8f3ab2c1115b03fad04f73cb25309412e003f0fbb355f3d1`
MD5	`75041f6105b7c448c07d9d1bb454734f`
BLAKE2b-256	`76736b2935d928e164f6aab96ce00b502bae466a3547d0eb823e4bc78606dc70`

See more details on using hashes here.

File details

Details for the file vectorlessrag-0.1.1-py3-none-any.whl.

File metadata

Download URL: vectorlessrag-0.1.1-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 25.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vectorlessrag-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dea2395f7fd7c2a95457d5081593c251c1ced16a037c18e02c5d941420f7047b`
MD5	`c41f277e3ebda3481a87884e91a99906`
BLAKE2b-256	`031700fbcc7a749927eccb34a9c4080fd0b7f20f4fecbbcf80fdd0df53d2ff91`

See more details on using hashes here.

vectorlessrag 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VectorlessRAG

The problem with RAG today

How VectorlessRAG works

What it can actually do

Quickstart

REST API Reference

POST /add_document/

GET /status/{job_id}

POST /query/

GET /topics/

GET /topics/{topic_name}/documents

GET /llms/

Multi-document topics

Supported LLMs

Parsing modes

Docker

Project structure

How the tree index works

Why not just use Claude/ChatGPT file upload?

Adding a new LLM provider

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

POST `/add_document/`

GET `/status/{job_id}`

POST `/query/`

GET `/topics/`

GET `/topics/{topic_name}/documents`

GET `/llms/`