Skip to main content

Hierarchical document intelligence without vectors

Project description

Vectorless

Crates.io PyPI Downloads Documentation License Rust

What is Vectorless?

Vectorless is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.

Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses a hybrid algorithm + LLM approach to navigate it — like how a human reads a table of contents:

  • Pilot (LLM) handles "where to go"
  • Algorithm handles "how to walk"

How It Works

How it works

1. Index: Build a Navigable Tree

Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│   ├── 2.1 System Design
│   └── 2.2 Implementation
└── Chapter 3: API Reference

Each node gets an AI-generated summary, enabling fast navigation.

2. Query: Navigate with LLM

When you ask "How do I reset the device?":

  1. Analyze — Understand query intent and complexity
  2. Navigate — LLM guides tree traversal (like reading a TOC)
  3. Retrieve — Return the exact section with context
  4. Verify — Check if more information is needed (backtracking)

Traditional RAG vs Vectorless

Traditional RAG vs Vectorless

Aspect Traditional RAG Vectorless
Infrastructure Vector DB + Embedding Model Just LLM API
Document Structure Lost in chunking Preserved
Context Fragment only Section + surrounding context
Setup Time Hours to Days Minutes
Best For Unstructured text Structured documents

Example

Input:

Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"

Output:

Answer: "To reset the device, hold the power button for 10 seconds 
until the LED flashes blue, then release..."

Source: Chapter 4 > Section 4.2 > Reset Procedure

When to Use

Good fit:

  • Technical documentation
  • Manuals and guides
  • Structured reports
  • Policy documents
  • Any document with clear hierarchy

Not ideal:

  • Unstructured text (tweets, chat logs)
  • Very short documents (< 1 page)
  • Pure Q&A datasets without structure

Quick Start

Python
pip install vectorless
from vectorless import Engine, IndexContext

# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")

# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
Rust
[dependencies]
vectorless = "0.1"
cp vectorless.example.toml ./vectorless.toml
use vectorless::Engine;

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    let client = Engine::builder()
        .with_workspace("./workspace")
        .build()?;

    let doc_id = client.index("./document.pdf").await?;

    let result = client.query(&doc_id,
        "What are the system requirements?").await?;

    println!("Answer: {}", result.content);
    println!("Source: {}", result.path);

    Ok(())
}

Features

Feature Description
Zero Infrastructure No vector DB, no embedding model — just an LLM API
Multi-format Support PDF, Markdown, DOCX, HTML out of the box
Incremental Updates Add/remove documents without full re-index
Traceable Results See the exact navigation path taken
Feedback Learning Improves from user feedback over time
Multi-turn Queries Handles complex questions with decomposition

Architecture

Architecture

Core Components

  • Index Pipeline — Parses documents, builds tree, generates summaries
  • Retrieval Pipeline — Analyzes query, navigates tree, returns results
  • Pilot — LLM-powered navigator that guides retrieval decisions
  • Metrics Hub — Unified observability for LLM calls, retrieval, and feedback

Examples

See the examples/ directory.

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

Star History Chart

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorless-0.1.0.tar.gz (342.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorless-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file vectorless-0.1.0.tar.gz.

File metadata

  • Download URL: vectorless-0.1.0.tar.gz
  • Upload date:
  • Size: 342.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for vectorless-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e32915ddd7e46321979b423fd5751ada9dd24abfe4c9be0268418327b0dfda25
MD5 28eb9fccf606d244ad59bfc794a15dcf
BLAKE2b-256 01ecbb60c7936e126138aa65aaac4b90803a29f8355ef208b6e23b1d38fee5a5

See more details on using hashes here.

File details

Details for the file vectorless-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vectorless-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8f9c5d076abd8349f801ef1722ae768c12d8ef5443170d3ac7fca1914f4273b2
MD5 c8051d60d4def6eacfc68665a7c185cd
BLAKE2b-256 3997f2635cc82ee7267066fcdbc4486214a853efdbaf78835e25415f88d2c012

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page