Skip to main content

Hierarchical document intelligence without vectors

Project description

Vectorless

PyPI Python PyPI Downloads Crates.io Crates.io Downloads Docs License Rust

What is Vectorless?

Vectorless is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.

Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses LLM to navigate it — like how a human reads a table of contents.

How It Works

How it works

1. Index: Build a Navigable Tree

Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│   ├── 2.1 System Design
│   └── 2.2 Implementation
└── Chapter 3: API Reference

Each node gets an AI-generated summary, enabling fast navigation.

2. Query: Navigate with LLM

When you ask "How do I reset the device?":

  1. Analyze — Understand query intent and complexity
  2. Navigate — LLM guides tree traversal
  3. Retrieve — Return the exact section with context
  4. Verify — Check if more information is needed

Traditional RAG vs Vectorless

Traditional RAG vs Vectorless
Aspect Traditional RAG Vectorless
Infrastructure Vector DB + Embedding Model Just LLM API
Document Structure Lost in chunking Preserved
Context Fragment only Section + surrounding context
Setup Time Hours to Days Minutes
Best For Unstructured text Structured documents

Example

Input:

Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"

Output:

Answer: "To reset the device, hold the power button for 10 seconds 
until the LED flashes blue, then release..."

Source: Chapter 4 > Section 4.2 > Reset Procedure

When to Use

Good fit:

  • Technical documentation
  • Manuals and guides
  • Structured reports
  • Policy documents
  • Any document with clear hierarchy

Not ideal:

  • Unstructured text (tweets, chat logs)
  • Very short documents (< 1 page)
  • Pure Q&A datasets without structure

Quick Start

Python
pip install vectorless
from vectorless import Engine, IndexContext

# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")

# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
Rust
[dependencies]
vectorless = "0.1"
cp vectorless.example.toml ./vectorless.toml
use vectorless::Engine;

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    let client = Engine::builder()
        .with_workspace("./workspace")
        .build()?;

    let doc_id = client.index("./document.pdf").await?;

    let result = client.query(&doc_id,
        "What are the system requirements?").await?;

    println!("Answer: {}", result.content);
    println!("Source: {}", result.path);

    Ok(())
}

Features

Feature Description
Zero Infrastructure No vector DB, no embedding model — just an LLM API
Multi-format Support PDF, Markdown, DOCX, HTML out of the box
Incremental Updates Add/remove documents without full re-index
Traceable Results See the exact navigation path taken
Feedback Learning Improves from user feedback over time
Multi-turn Queries Handles complex questions with decomposition

Configuration

Zero Configuration (Recommended)

Just set OPENAI_API_KEY and you're ready to go:

export OPENAI_API_KEY="sk-..."
Python
from vectorless import Engine

# Uses OPENAI_API_KEY from environment
engine = Engine(workspace="./data")
Rust
use vectorless::Engine;

let client = Engine::builder()
    .with_workspace("./workspace")
    .build().await?;

Environment Variables

Variable Description
OPENAI_API_KEY LLM API key
VECTORLESS_MODEL Default model (e.g., gpt-4o-mini)
VECTORLESS_ENDPOINT API endpoint URL
VECTORLESS_WORKSPACE Workspace directory

Advanced Configuration

For fine-grained control, use a config file:

cp config.toml ./vectorless.toml
Python
from vectorless import Engine

# Use full configuration file
engine = Engine(config_path="./vectorless.toml")

# Or override specific settings
engine = Engine(
    config_path="./vectorless.toml",
    model="gpt-4o",  # Override model from config
)
Rust
use vectorless::Engine;

// Use full configuration file
let client = Engine::builder()
    .with_config_path("./vectorless.toml")
    .build().await?;

// Or override specific settings
let client = Engine::builder()
    .with_config_path("./vectorless.toml")
    .with_model("gpt-4o", None)  // Override model
    .build().await?;

Configuration Priority

Later overrides earlier:

  1. Default configuration
  2. Auto-detected config file (vectorless.toml, config.toml, .vectorless.toml)
  3. Explicit config file (config_path / with_config_path)
  4. Environment variables
  5. Constructor/builder parameters (highest priority)

Architecture

Architecture

Core Components

  • Index Pipeline — Parses documents, builds tree, generates summaries
  • Retrieval Pipeline — Analyzes query, navigates tree, returns results
  • Pilot — LLM-powered navigator that guides retrieval decisions
  • Metrics Hub — Unified observability for LLM calls, retrieval, and feedback

Examples

See the examples/ directory for more usage patterns.

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

Star History Chart

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorless-0.1.1.tar.gz (353.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorless-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file vectorless-0.1.1.tar.gz.

File metadata

  • Download URL: vectorless-0.1.1.tar.gz
  • Upload date:
  • Size: 353.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for vectorless-0.1.1.tar.gz
Algorithm Hash digest
SHA256 31dee47c833857d243a07ac2c6831297c02fd968a2fab122a135aa8c0ebff888
MD5 8ea54f33839a3f863c67bba160b2e52c
BLAKE2b-256 2bfea9a6adc58913f92c171cad83c45c2892f812a104a4cab8da6e25bfd3aaf4

See more details on using hashes here.

File details

Details for the file vectorless-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vectorless-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9fea7e727d1aecf7f016bbb440189cb408cb4514dcf63eeecc1fd828e1422ebc
MD5 29b6ce60240e9d635a4ff5d402ee733e
BLAKE2b-256 9d68d6186e3f2eadd39312231016d8ba3044bb9b3ce4d06ef330837908d2f761

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page