Reasoning-native document intelligence engine for AI
Project description
Vectorless is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it will reason through any of your structured documents — PDFs, Markdown, reports, contracts — and retrieve only what's relevant. Nothing more, nothing less.
How It Works
Quick Start
Rust
[dependencies]
vectorless = "0.1"
use vectorless::client::{EngineBuilder, IndexContext, QueryContext};
#[tokio::main]
async fn main() -> vectorless::Result<()> {
let engine = EngineBuilder::new()
.with_key("sk-...")
.with_model("gpt-4o")
.with_endpoint("https://api.openai.com/v1")
.build()
.await?;
// Index a document
let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
let doc_id = result.doc_id().unwrap();
// Query
let result = engine.query(
QueryContext::new("What is the total revenue?")
.with_doc_ids(vec![doc_id.to_string()])
).await?;
println!("{}", result.content);
Ok(())
}
Python
pip install vectorless
import asyncio
from vectorless import Engine, IndexContext, QueryContext
async def main():
engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")
# Index a document
result = await engine.index(IndexContext.from_path("./report.pdf"))
doc_id = result.doc_id
# Query
result = await engine.query(
QueryContext("What is the total revenue?").with_doc_ids([doc_id])
)
print(result.single().content)
asyncio.run(main())
Core Concepts
Semantic Tree Index
When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:
Annual Report 2024
├── Executive Summary
│ ├── Financial Highlights
│ └── Strategic Outlook
├── Financial Statements
│ ├── Revenue Analysis ← "What is the total revenue?" lands here
│ ├── Operating Expenses
│ └── Net Income
└── Risk Factors
├── Market Risks
└── Regulatory Risks
Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.
Cross-Document Graph
When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.
# Query across all indexed documents
result = await engine.query(
QueryContext("Compare revenue trends across all reports")
)
Workspace Persistence
Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:
engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")
# List all indexed documents
docs = await engine.list()
for doc in docs:
print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")
What It's For
Vectorless is designed for applications that need precise document retrieval:
- Financial analysis — Extract specific figures from reports, compare across filings
- Legal research — Find relevant clauses, trace definitions across documents
- Technical documentation — Navigate large manuals, locate specific procedures
- Academic research — Cross-reference findings across papers
- Compliance — Audit trails with source references for every answer
Examples
See examples/ for complete usage patterns.
Contributing
Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.
Star History
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorless-0.1.8.tar.gz.
File metadata
- Download URL: vectorless-0.1.8.tar.gz
- Upload date:
- Size: 391.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b48ed9fe3e8679ae49baae56ee5f29d947aa12e8314f513ca49b610e1c17981d
|
|
| MD5 |
b9b0b3389da69798a502ae4c3a0f41f2
|
|
| BLAKE2b-256 |
6b757531bdeed6b79ddf901f35b1f8cc7a820907c512cc3b84c70be5d7ec8775
|
File details
Details for the file vectorless-0.1.8-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: vectorless-0.1.8-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.9 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75e82a61c3360916bc5f95439978ac4be123c56690d5852dd2979f358d4de439
|
|
| MD5 |
1f6160c32677a16f4f7a9288cbe60bfd
|
|
| BLAKE2b-256 |
918fc8e5296620fd4f7b43fd4d483cab7462cdee9eead386b7323a4df0266c97
|