Hierarchical document intelligence without vectors
Project description
What is Vectorless?
Vectorless is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.
Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses a hybrid algorithm + LLM approach to navigate it — like how a human reads a table of contents:
- Pilot (LLM) handles "where to go"
- Algorithm handles "how to walk"
How It Works
1. Index: Build a Navigable Tree
Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│ ├── 2.1 System Design
│ └── 2.2 Implementation
└── Chapter 3: API Reference
Each node gets an AI-generated summary, enabling fast navigation.
2. Query: Navigate with LLM
When you ask "How do I reset the device?":
- Analyze — Understand query intent and complexity
- Navigate — LLM guides tree traversal (like reading a TOC)
- Retrieve — Return the exact section with context
- Verify — Check if more information is needed (backtracking)
Traditional RAG vs Vectorless
| Aspect | Traditional RAG | Vectorless |
|---|---|---|
| Infrastructure | Vector DB + Embedding Model | Just LLM API |
| Document Structure | Lost in chunking | Preserved |
| Context | Fragment only | Section + surrounding context |
| Setup Time | Hours to Days | Minutes |
| Best For | Unstructured text | Structured documents |
Example
Input:
Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"
Output:
Answer: "To reset the device, hold the power button for 10 seconds
until the LED flashes blue, then release..."
Source: Chapter 4 > Section 4.2 > Reset Procedure
When to Use
✅ Good fit:
- Technical documentation
- Manuals and guides
- Structured reports
- Policy documents
- Any document with clear hierarchy
❌ Not ideal:
- Unstructured text (tweets, chat logs)
- Very short documents (< 1 page)
- Pure Q&A datasets without structure
Quick Start
Python
pip install vectorless
from vectorless import Engine, IndexContext
# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")
# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)
# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
Rust
[dependencies]
vectorless = "0.1"
cp vectorless.example.toml ./vectorless.toml
use vectorless::Engine;
#[tokio::main]
async fn main() -> vectorless::Result<()> {
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;
let doc_id = client.index("./document.pdf").await?;
let result = client.query(&doc_id,
"What are the system requirements?").await?;
println!("Answer: {}", result.content);
println!("Source: {}", result.path);
Ok(())
}
Features
| Feature | Description |
|---|---|
| Zero Infrastructure | No vector DB, no embedding model — just an LLM API |
| Multi-format Support | PDF, Markdown, DOCX, HTML out of the box |
| Incremental Updates | Add/remove documents without full re-index |
| Traceable Results | See the exact navigation path taken |
| Feedback Learning | Improves from user feedback over time |
| Multi-turn Queries | Handles complex questions with decomposition |
Architecture
Core Components
- Index Pipeline — Parses documents, builds tree, generates summaries
- Retrieval Pipeline — Analyzes query, navigates tree, returns results
- Pilot — LLM-powered navigator that guides retrieval decisions
- Metrics Hub — Unified observability for LLM calls, retrieval, and feedback
Examples
See the examples/ directory.
Contributing
Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.
Star History
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorless-0.1.0.tar.gz.
File metadata
- Download URL: vectorless-0.1.0.tar.gz
- Upload date:
- Size: 342.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e32915ddd7e46321979b423fd5751ada9dd24abfe4c9be0268418327b0dfda25
|
|
| MD5 |
28eb9fccf606d244ad59bfc794a15dcf
|
|
| BLAKE2b-256 |
01ecbb60c7936e126138aa65aaac4b90803a29f8355ef208b6e23b1d38fee5a5
|
File details
Details for the file vectorless-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: vectorless-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f9c5d076abd8349f801ef1722ae768c12d8ef5443170d3ac7fca1914f4273b2
|
|
| MD5 |
c8051d60d4def6eacfc68665a7c185cd
|
|
| BLAKE2b-256 |
3997f2635cc82ee7267066fcdbc4486214a853efdbaf78835e25415f88d2c012
|