Hierarchical document intelligence without vectors
Project description
What is Vectorless?
Vectorless is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.
Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses LLM to navigate it — like how a human reads a table of contents.
How It Works
1. Index: Build a Navigable Tree
Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│ ├── 2.1 System Design
│ └── 2.2 Implementation
└── Chapter 3: API Reference
Each node gets an AI-generated summary, enabling fast navigation.
2. Query: Navigate with LLM
When you ask "How do I reset the device?":
- Analyze — Understand query intent and complexity
- Navigate — LLM guides tree traversal
- Retrieve — Return the exact section with context
- Verify — Check if more information is needed
Traditional RAG vs Vectorless
| Aspect | Traditional RAG | Vectorless |
|---|---|---|
| Infrastructure | Vector DB + Embedding Model | Just LLM API |
| Document Structure | Lost in chunking | Preserved |
| Context | Fragment only | Section + surrounding context |
| Setup Time | Hours to Days | Minutes |
| Best For | Unstructured text | Structured documents |
Example
Input:
Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"
Output:
Answer: "To reset the device, hold the power button for 10 seconds
until the LED flashes blue, then release..."
Source: Chapter 4 > Section 4.2 > Reset Procedure
When to Use
✅ Good fit:
- Technical documentation
- Manuals and guides
- Structured reports
- Policy documents
- Any document with clear hierarchy
❌ Not ideal:
- Unstructured text (tweets, chat logs)
- Very short documents (< 1 page)
- Pure Q&A datasets without structure
Quick Start
Python
pip install vectorless
from vectorless import Engine, IndexContext
# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")
# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)
# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
Rust
[dependencies]
vectorless = "0.1"
cp vectorless.example.toml ./vectorless.toml
use vectorless::Engine;
#[tokio::main]
async fn main() -> vectorless::Result<()> {
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;
let doc_id = client.index("./document.pdf").await?;
let result = client.query(&doc_id,
"What are the system requirements?").await?;
println!("Answer: {}", result.content);
println!("Source: {}", result.path);
Ok(())
}
Features
| Feature | Description |
|---|---|
| Zero Infrastructure | No vector DB, no embedding model — just an LLM API |
| Multi-format Support | PDF, Markdown, DOCX, HTML out of the box |
| Incremental Updates | Add/remove documents without full re-index |
| Traceable Results | See the exact navigation path taken |
| Feedback Learning | Improves from user feedback over time |
| Multi-turn Queries | Handles complex questions with decomposition |
Configuration
Zero Configuration (Recommended)
Just set OPENAI_API_KEY and you're ready to go:
export OPENAI_API_KEY="sk-..."
Python
from vectorless import Engine
# Uses OPENAI_API_KEY from environment
engine = Engine(workspace="./data")
Rust
use vectorless::Engine;
let client = Engine::builder()
.with_workspace("./workspace")
.build().await?;
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
LLM API key |
VECTORLESS_MODEL |
Default model (e.g., gpt-4o-mini) |
VECTORLESS_ENDPOINT |
API endpoint URL |
VECTORLESS_WORKSPACE |
Workspace directory |
Advanced Configuration
For fine-grained control, use a config file:
cp config.toml ./vectorless.toml
Python
from vectorless import Engine
# Use full configuration file
engine = Engine(config_path="./vectorless.toml")
# Or override specific settings
engine = Engine(
config_path="./vectorless.toml",
model="gpt-4o", # Override model from config
)
Rust
use vectorless::Engine;
// Use full configuration file
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.build().await?;
// Or override specific settings
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.with_model("gpt-4o", None) // Override model
.build().await?;
Configuration Priority
Later overrides earlier:
- Default configuration
- Auto-detected config file (
vectorless.toml,config.toml,.vectorless.toml) - Explicit config file (
config_path/with_config_path) - Environment variables
- Constructor/builder parameters (highest priority)
Architecture
Core Components
- Index Pipeline — Parses documents, builds tree, generates summaries
- Retrieval Pipeline — Analyzes query, navigates tree, returns results
- Pilot — LLM-powered navigator that guides retrieval decisions
- Metrics Hub — Unified observability for LLM calls, retrieval, and feedback
Examples
See the examples/ directory for more usage patterns.
Contributing
Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.
Star History
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorless-0.1.1.tar.gz.
File metadata
- Download URL: vectorless-0.1.1.tar.gz
- Upload date:
- Size: 353.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31dee47c833857d243a07ac2c6831297c02fd968a2fab122a135aa8c0ebff888
|
|
| MD5 |
8ea54f33839a3f863c67bba160b2e52c
|
|
| BLAKE2b-256 |
2bfea9a6adc58913f92c171cad83c45c2892f812a104a4cab8da6e25bfd3aaf4
|
File details
Details for the file vectorless-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: vectorless-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.7 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fea7e727d1aecf7f016bbb440189cb408cb4514dcf63eeecc1fd828e1422ebc
|
|
| MD5 |
29b6ce60240e9d635a4ff5d402ee733e
|
|
| BLAKE2b-256 |
9d68d6186e3f2eadd39312231016d8ba3044bb9b3ce4d06ef330837908d2f761
|