Skip to main content

Reasoning-native document intelligence engine for AI

Project description

Vectorless

Document Engine for AI

PyPI PyPI Downloads Crates.io Crates.io Downloads Docs License

Vectorless is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it will reason through any of your structured documents — PDFs, Markdown, reports, contracts — and retrieve only what's relevant. Nothing more, nothing less.

How It Works

Vectorless Workflow
Vectorless Demo

Quick Start

Rust

[dependencies]
vectorless = "0.1"
use vectorless::client::{EngineBuilder, IndexContext, QueryContext};

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    let engine = EngineBuilder::new()
        .with_key("sk-...")
        .with_model("gpt-4o")
        .with_endpoint("https://api.openai.com/v1")
        .build()
        .await?;

    // Index a document
    let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
    let doc_id = result.doc_id().unwrap();

    // Query
    let result = engine.query(
        QueryContext::new("What is the total revenue?")
            .with_doc_ids(vec![doc_id.to_string()])
    ).await?;
    println!("{}", result.content);

    Ok(())
}

Python

pip install vectorless
import asyncio
from vectorless import Engine, IndexContext, QueryContext

async def main():
    engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")

    # Index a document
    result = await engine.index(IndexContext.from_path("./report.pdf"))
    doc_id = result.doc_id

    # Query
    result = await engine.query(
        QueryContext("What is the total revenue?").with_doc_ids([doc_id])
    )
    print(result.single().content)

asyncio.run(main())

Core Concepts

Semantic Tree Index

When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:

Annual Report 2024
├── Executive Summary
│   ├── Financial Highlights
│   └── Strategic Outlook
├── Financial Statements
│   ├── Revenue Analysis        ← "What is the total revenue?" lands here
│   ├── Operating Expenses
│   └── Net Income
└── Risk Factors
    ├── Market Risks
    └── Regulatory Risks

Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.

Cross-Document Graph

When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.

# Query across all indexed documents
result = await engine.query(
    QueryContext("Compare revenue trends across all reports")
)

Workspace Persistence

Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:

engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")

# List all indexed documents
docs = await engine.list()
for doc in docs:
    print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")

What It's For

Vectorless is designed for applications that need precise document retrieval:

  • Financial analysis — Extract specific figures from reports, compare across filings
  • Legal research — Find relevant clauses, trace definitions across documents
  • Technical documentation — Navigate large manuals, locate specific procedures
  • Academic research — Cross-reference findings across papers
  • Compliance — Audit trails with source references for every answer

Examples

See examples/ for complete usage patterns.

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

Star History Chart

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorless-0.1.8.tar.gz (391.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorless-0.1.8-cp310-cp310-manylinux_2_34_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file vectorless-0.1.8.tar.gz.

File metadata

  • Download URL: vectorless-0.1.8.tar.gz
  • Upload date:
  • Size: 391.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for vectorless-0.1.8.tar.gz
Algorithm Hash digest
SHA256 b48ed9fe3e8679ae49baae56ee5f29d947aa12e8314f513ca49b610e1c17981d
MD5 b9b0b3389da69798a502ae4c3a0f41f2
BLAKE2b-256 6b757531bdeed6b79ddf901f35b1f8cc7a820907c512cc3b84c70be5d7ec8775

See more details on using hashes here.

File details

Details for the file vectorless-0.1.8-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vectorless-0.1.8-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 75e82a61c3360916bc5f95439978ac4be123c56690d5852dd2979f358d4de439
MD5 1f6160c32677a16f4f7a9288cbe60bfd
BLAKE2b-256 918fc8e5296620fd4f7b43fd4d483cab7462cdee9eead386b7323a4df0266c97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page