Document intelligence API — no vectors, no embeddings, just LLM reasoning over document structure.
Project description
VectorlessRAG
No vectors. No embeddings. No chunking. Just intelligence.
██╗ ██╗███████╗ ██████╗████████╗ ██████╗ ██████╗ ██╗ ███████╗███████╗███████╗
██║ ██║██╔════╝██╔════╝╚══██╔══╝██╔═══██╗██╔══██╗██║ ██╔════╝██╔════╝██╔════╝
██║ ██║█████╗ ██║ ██║ ██║ ██║██████╔╝██║ █████╗ ███████╗███████╗
╚██╗ ██╔╝██╔══╝ ██║ ██║ ██║ ██║██╔══██╗██║ ██╔══╝ ╚════██║╚════██║
╚████╔╝ ███████╗╚██████╗ ██║ ╚██████╔╝██║ ██║███████╗███████╗███████║███████║
╚═══╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚══════╝╚══════╝╚══════╝
RAG
The problem with RAG today
You split your 200-page annual report into chunks. You embed them. You search by similarity. You get back 5 random paragraphs that kinda match the question. Your LLM hallucinates the rest.
Chunking destroys context. Embeddings find similar words, not answers. Vector search doesn't understand documents — it just pattern matches.
This problem gets worse with structured documents — financial statements, legal contracts, technical specifications — where every row in a table depends on the rows around it. Split them into chunks and you lose the table entirely.
How VectorlessRAG works
Instead of chunking, VectorlessRAG reads your document like a human analyst would:
Step 1 — Parse
Every page is extracted as raw text using pdfplumber (digital PDFs) or vision mode using PyMuPDF + an LLM (scanned/image PDFs). No chunking. Pages are stored in order.
Step 2 — Index An LLM reads the document and builds a hierarchical JSON tree — like a smart table of contents with factual summaries at every level. Each node knows its exact page range.
Microsoft Annual Report (pages 1-220)
├── Financial Statements (pages 34-45)
│ ├── INCOME STATEMENTS (pages 36-37) ← pages 36-37 stored
│ ├── BALANCE SHEETS (pages 38-39) ← pages 38-39 stored
│ └── CASH FLOW STATEMENTS (pages 40-41) ← pages 40-41 stored
├── Notes to Financial Statements (pages 46-130)
│ ├── Note 5 — Income Taxes (pages 58-60)
│ └── Note 16 — Other Income (pages 88-89)
└── ...
Large documents (>10 pages) are indexed in parallel batches using ThreadPoolExecutor. Each batch is given its absolute page range so the LLM never uses printed footer page numbers.
Step 3 — Retrieve
At query time, the tree is flattened to a compact JSON list of {node_id, title, pages, summary}. An LLM reads this list and returns the node IDs most likely to contain the answer. The actual pages for those nodes are fetched with a ±3 page buffer to handle slight indexing offsets.
Step 4 — Answer The retrieved page text is sent to the LLM with the full query. The LLM reasons over real document content — not chunks, not embeddings, not summaries. It can compute derived metrics, compare across sections, and produce structured output.
What it can actually do
These are real queries run against the Microsoft FY2025 Annual Report (220 pages):
Fetch a complete financial statement
Query: "fetch the income statement"
Response: Full table with Product/Service revenue breakdown, COGS, Gross Margin,
R&D, S&M, G&A, Operating Income, Other Income, Tax, Net Income, EPS (basic + diluted),
weighted average shares — all three years.
Compute derived financial metrics
Query: "what is the net profit margin and tell me the trend"
Response: Calculated NPM for FY2023 (34.14%), FY2024 (35.96%), FY2025 (36.14%)
with trend interpretation.
Multi-statement analysis (DuPont)
Query: "is it a good company to invest? do dupont analysis"
Response: Full DuPont decomposition pulling from both income statement and balance sheet:
NPM × Asset Turnover × Equity Multiplier = ROE (29.65%), plus investment thesis
with risks, shareholder return analysis, and ESG commentary.
Build a financial model
Query: "build a 3-year DCF model with 10% discount rate and 3% terminal growth rate"
Response: Full projected FCF model with historical anchors, revenue growth assumptions,
CapEx projections, discounted cash flows, terminal value, and enterprise value estimate.
Compute a metric not printed in any statement
Query: "compute the free cash flow"
Response: OCF ($136,162M) − CapEx ($64,551M) = FCF $71,611M — correctly sourced
from the Cash Flow Statement.
Quickstart
1. Install
pip install vectorlessrag
Or from source:
git clone https://github.com/akhilajithnair4/vectorlessrag.git
cd vectorlessrag
pip install -r requirements.txt
2. Configure your LLM
# .env
LLM_PROVIDER=gemini
GEMINI_API_KEY=your-key-here
3. Use as a Python library
from vectorlessrag import VectorLessRag
from llms.gemini_llm import GeminiLLM
llm = GeminiLLM(api_key="your-key")
rag = VectorLessRag(llm=llm)
# Index a document (runs in background)
job_id = rag.add_document("microsoft_10k.pdf", topic_name="annual_reports", mode="text")
# Poll until indexed
status = rag.get_status(job_id) # "processing" → "indexed"
# Query
answer = rag.query("What was the net profit margin trend?", topic_name="annual_reports")
print(answer)
4. Or run as a REST API
uvicorn api.api:app --host 0.0.0.0 --port 8000
Interactive docs available at http://localhost:8000/docs
REST API Reference
POST /add_document/
Upload a PDF and start indexing in the background.
Parameters (multipart form):
| Field | Type | Description |
|---|---|---|
file |
File | PDF file to index |
topic_name |
string | Collection to add the document to |
mode |
string | text (default) or vision for scanned PDFs |
Response:
{
"job_id": "0248547e-a69e-46d6-987f-8612051957f6",
"status": "processing"
}
GET /status/{job_id}
Check indexing progress.
Response:
{
"job_id": "0248547e-...",
"status": "indexed",
"doc_id": "temp_2025_AnnualReport_0541c6"
}
Possible statuses: processing → parsed → indexed | error
POST /query/
Query your indexed documents.
Parameters (query string):
| Field | Type | Description |
|---|---|---|
topic_name |
string | Topic to search within |
query |
string | Natural language question |
Response:
{
"response": "The net profit margin for FY2025 was 36.14%..."
}
The system retrieves the relevant pages and synthesizes a complete answer. If the query asks for a table or statement, it reproduces all rows exactly.
GET /topics/
List all available topics.
GET /topics/{topic_name}/documents
List all documents indexed under a topic.
GET /llms/
Show the currently configured LLM provider.
Multi-document topics
Upload multiple documents to the same topic and query across all of them simultaneously. The retriever loops over every document in the topic and merges results before synthesis.
# Upload three annual reports to one topic
curl -X POST "http://localhost:8000/add_document/" \
-F "file=@microsoft_2025.pdf" -F "topic_name=big_tech"
curl -X POST "http://localhost:8000/add_document/" \
-F "file=@apple_2025.pdf" -F "topic_name=big_tech"
curl -X POST "http://localhost:8000/add_document/" \
-F "file=@google_2025.pdf" -F "topic_name=big_tech"
# Query across all three
curl -X POST "http://localhost:8000/query/?topic_name=big_tech&query=compare+net+profit+margins+and+recommend+the+best+investment"
Supported LLMs
Set LLM_PROVIDER in .env and provide the corresponding API key. No code changes needed to switch providers.
| Provider | LLM_PROVIDER |
Key variable |
|---|---|---|
| OpenAI (GPT-4o) | openai |
OPENAI_API_KEY |
| Google Gemini | gemini |
GEMINI_API_KEY |
| Anthropic Claude | claude |
ANTHROPIC_API_KEY |
| Ollama (local) | ollama |
— |
LLM_PROVIDER=gemini
GEMINI_API_KEY=AIza...
Vision mode (for scanned PDFs) requires a multimodal LLM. Gemini 1.5 Pro and GPT-4o both work. Ollama support depends on the local model.
Parsing modes
| Mode | How it works | When to use |
|---|---|---|
text |
pdfplumber extracts text layer |
Digital/native PDFs (reports, docs) |
vision |
PyMuPDF renders each page to PNG, LLM reads the image |
Scanned PDFs, image-based documents |
# Text mode (fast, cheap)
rag.add_document("report.pdf", topic_name="docs", mode="text")
# Vision mode (slower, handles scans)
rag.add_document("scanned_contract.pdf", topic_name="legal", mode="vision")
Docker
docker build -t vectorlessrag .
docker run -p 8000:8000 --env-file .env vectorlessrag
Project structure
vectorlessindex/
├── vectorlessrag.py # Main Python library entry point
├── api/
│ └── api.py # FastAPI app — all endpoints
├── parsers/
│ └── pdf_parser.py # Text + vision PDF parsing
├── indexer/
│ └── indexer.py # Builds hierarchical JSON tree index
├── retrievers/
│ └── retriever.py # Flattens tree, selects nodes, fetches pages
├── storage/
│ └── storage.py # File-based JSON storage per topic
├── llms/
│ ├── base.py # BaseLLM abstract class
│ ├── gemini_llm.py
│ ├── openai_llm.py
│ ├── claude_llm.py
│ └── ollama_llm.py
├── prompts/
│ ├── indexer_prompt.py # Instructs LLM to build the tree
│ └── retriever_prompt.py # Instructs LLM to select relevant nodes
└── core/
└── config.py # LLM provider config from .env
How the tree index works
Each document is stored as a nested JSON tree. Every node contains:
{
"node_id": "0036",
"title": "INCOME STATEMENTS",
"start_index": 36,
"end_index": 37,
"summary": "Consolidated income statement for FY2025/24/23. Revenue $281.7B/$245.1B/$211.9B. Net income $101.8B/$88.1B/$72.4B. Diluted EPS $13.64/$11.80/$9.68.",
"sub_nodes": []
}
Parent nodes use prefix_summary (broad overview). Leaf nodes use summary (specific facts, numbers, names). The retriever sends only leaf summaries to the selection LLM — keeping the prompt small while preserving enough detail to pick the right section.
For large documents, indexing runs in parallel batches. Each batch is given its absolute page range in the prompt so the LLM never mistakes printed footer page numbers (e.g. "Page 70" in a footer) for actual document position. This was a critical fix for annual reports and 10-Ks where internal page numbering differs from PDF page position.
Why not just use Claude/ChatGPT file upload?
| Claude / ChatGPT Upload | VectorlessRAG | |
|---|---|---|
| Document size | ~50 pages reliably | Unlimited (batched indexing) |
| Persistence | Gone when session ends | Stored, queryable forever |
| Multi-document | One at a time | Multiple per topic |
| API access | Manual UI only | REST API — any app can call it |
| Cost per query | Full doc reprocessed every time | Index once, cheap queries |
| Cross-doc analysis | Manual, session-scoped | Native |
| Programmatic | No | Yes |
Claude is a tool. VectorlessRAG is infrastructure — a queryable document intelligence backend that any application can integrate.
Adding a new LLM provider
Create a file in llms/ that extends BaseLLM:
from llms.base import BaseLLM
class MyLLM(BaseLLM):
def call(self, prompt: str) -> str:
# your LLM call here
return response_text
def call_vision(self, prompt: str, image_bytes: bytes) -> str:
# optional — only needed for vision mode
return response_text
Then add it to core/config.py under get_llm(). No other changes needed.
Contributing
Open source. PRs welcome.
- Found a bug? Open an issue.
- New LLM provider? Add a file to
llms/. - New parser? Add a file to
parsers/.
Built with the belief that documents deserve better than chunking.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorlessrag-0.1.1.tar.gz.
File metadata
- Download URL: vectorlessrag-0.1.1.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
028365775cdd5f2f8f3ab2c1115b03fad04f73cb25309412e003f0fbb355f3d1
|
|
| MD5 |
75041f6105b7c448c07d9d1bb454734f
|
|
| BLAKE2b-256 |
76736b2935d928e164f6aab96ce00b502bae466a3547d0eb823e4bc78606dc70
|
File details
Details for the file vectorlessrag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: vectorlessrag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dea2395f7fd7c2a95457d5081593c251c1ced16a037c18e02c5d941420f7047b
|
|
| MD5 |
c41f277e3ebda3481a87884e91a99906
|
|
| BLAKE2b-256 |
031700fbcc7a749927eccb34a9c4080fd0b7f20f4fecbbcf80fdd0df53d2ff91
|