Reasoning-based Document Engine

These details have not been verified by PyPI

Project links

Project description

Reasoning-based Document Engine

Reason, don't vector · Structure, not chunks · Agents, not embeddings · Exact, not synthesized

Vectorless is a reasoning-native document engine written in Rust. It compiles documents into navigable trees, then dispatches multiple agents to find exactly what's relevant across your PDFs, Markdown, reports, contracts. No embeddings, no chunking, no approximate nearest neighbors. Every retrieval is a reasoning act.

Light up a star and shine with us! ⭐

Three Rules

Reason, don't vector. Retrieval is a reasoning act, not a similarity computation.
Model fails, we fail. No heuristic fallbacks, no silent degradation.
No thought, no answer. Only reasoned output counts as an answer.

Why Vectorless

Traditional RAG systems split documents into chunks, embed them into vectors, and retrieve by similarity. Vectorless takes a different approach: it preserves document structure as a navigable tree and lets agents reason through it.

	Embedding-Based RAG	Vectorless
Indexing	Chunk → embed → vector store	Parse → compile → document tree
Retrieval	Cosine similarity (approximate)	Multi-agent navigation (exact)
Structure	Destroyed by chunking	Preserved as first-class tree
Query handling	Keyword/similarity match	Intent classification + decomposition
Multi-hop reasoning	Not supported	Orchestrator replans dynamically
Output	Retrieved chunks	Original text passages, exact
Failure mode	Silent degradation	Explicit — no reasoning, no answer

How It Works

Four-Artifact Index Architecture

When a document is indexed, the compile pipeline builds four artifacts:

Content Layer          Navigation Layer              Reasoning Index            Document Card
DocumentTree          NavigationIndex               ReasoningIndex            DocCard
(TreeNode)            (NavEntry, ChildRoute)        (topic_paths, hot_nodes)  (title, overview,
      │                      │                              │                 question hints)
      │                      │                              │                    │
 Agent reads           Agent reads every            Agent's targeted        Orchestrator reads
 only on cat           decision round               search tool (grep)      for multi-doc routing

Content Layer — The raw document tree. The agent only accesses this when reading specific paragraphs (cat).
Navigation Layer — Each non-leaf node stores an overview, question hints, and child routes (title + description). The agent reads this every round to decide where to go next.
Reasoning Index — Keyword-topic mappings with weights. Provides the agent's grep tool with structured keyword data for targeted search within a document.
DocCard — A compact document-level summary. The Orchestrator reads DocCards to decide which documents to navigate in multi-document queries, without loading full documents.

This separation means the agent makes routing decisions from lightweight metadata, not by scanning full content.

Agent-Based Retrieval

Engine.query("What drove the revenue decline?")
  │
  ├─ Query Understanding ── intent, concepts, strategy (LLM)
  │
  ├─ Orchestrator ── analyzes query, dispatches Workers
  │   │
  │   ├─ Worker 1 ── ls → cd "Financials" → ls → cd "Revenue" → cat
  │   └─ Worker 2 ── ls → cd "Risk Factors" → grep "decline" → cat
  │   │
  │   └─ evaluate ── insufficient? → replan → dispatch new paths → loop
  │
  └─ Fusion ── dedup, LLM-scored relevance, return with source attribution

Worker navigation commands:

Command	Action	Reads
`ls`	List child sections	Navigation Layer (ChildRoute)
`cd`	Enter a child section	Navigation Layer
`cat`	Read content at current node	Content Layer (DocumentTree)
`grep`	Search by keyword	Reasoning Index (topic_paths)

The Orchestrator evaluates Worker results after each round. If evidence is insufficient, it replans — adjusting strategy, dispatching new paths, or deepening exploration. This continues until enough evidence is collected.

Quick Start

pip install vectorless

import asyncio
from vectorless import Engine, IndexContext, QueryContext

async def main():
    engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")

    # Index a document
    result = await engine.index(IndexContext.from_path("./report.pdf"))
    doc_id = result.doc_id

    # Query
    result = await engine.query(
        QueryContext("What is the total revenue?").with_doc_ids([doc_id])
    )
    print(result.single().content)

asyncio.run(main())

Key Features

Rust Core — The entire engine (indexing, retrieval, agent, storage) is implemented in Rust for performance and reliability. Python SDK via PyO3 bindings and a CLI are also provided.
Multi-Agent Retrieval — Every query is handled by multiple cooperating agents: an Orchestrator plans and evaluates, Workers navigate documents. Each retrieval is a reasoning act — not a similarity score, but a sequence of LLM decisions about where to look, what to read, and when to stop.
Zero Vectors — No embedding model, no vector store, no similarity search. This eliminates a class of failure modes: wrong chunk boundaries, stale embeddings, and similarity-score false positives.
Tree Navigation — Documents are compiled into hierarchical trees that preserve the original structure — headings, sections, paragraphs, lists. Workers navigate this tree the way a human would: scan the table of contents, jump to the relevant section, read the passage.
Document-Exact Output — Returns original text passages from the source document. No synthesis, no rewriting, no hallucinated content. What you get is what was written.
Multi-Document Orchestration — Query across multiple documents with a single call. The Orchestrator dispatches Workers, evaluates evidence, and fuses results. When one document is insufficient, it replans and expands the search scope.
Query Understanding — Every query passes through LLM-based intent classification, concept extraction, and strategy selection. Complex queries are decomposed into sub-queries. The system adapts its navigation strategy based on whether the query is factual, analytical, comparative, or navigational.
Checkpointable Pipeline — The 8-stage compile pipeline writes checkpoints at each stage. If indexing is interrupted (LLM rate limit, network failure), it resumes from the last completed stage — no wasted work.
Incremental Updates — Content fingerprinting detects changes at the node level. Re-indexing a modified document only recompiles the changed sections and their dependents.

Supported Documents

PDF — Full text extraction with page metadata
Markdown — Structure-aware parsing (headings, lists, code blocks)

Resources

Documentation — Guides, architecture, API reference
Rust API Docs — Auto-generated crate documentation
PyPI — Python package
Crates.io — Rust crate
Examples — Complete usage patterns for Python and Rust

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.15

Apr 29, 2026

0.1.14

Apr 29, 2026

0.1.13

Apr 25, 2026

0.1.12

Apr 23, 2026

This version

0.1.11

Apr 21, 2026

0.1.10

Apr 20, 2026

0.1.9

Apr 20, 2026

0.1.8

Apr 16, 2026

0.1.7

Apr 15, 2026

0.1.6

Apr 15, 2026

0.1.5

Apr 13, 2026

0.1.4

Apr 13, 2026

0.1.3

Apr 12, 2026

0.1.2

Apr 12, 2026

0.1.1

Apr 8, 2026

0.1.0

Apr 7, 2026

0.0.1

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorless-0.1.11.tar.gz (311.2 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectorless-0.1.11-cp310-cp310-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded Apr 21, 2026 CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file vectorless-0.1.11.tar.gz.

File metadata

Download URL: vectorless-0.1.11.tar.gz
Upload date: Apr 21, 2026
Size: 311.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.12.6

File hashes

Hashes for vectorless-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`1df2e9de40f392a203768b3aa2ca325bc205c638d11dd3323ffbc4fab924864b`
MD5	`46fbb36213fac828a96e7b56c5fca394`
BLAKE2b-256	`d8163ce19190fc71ca536da30b9964d8a135094a4e4e9db18fb185fbdddc69a1`

See more details on using hashes here.

File details

Details for the file vectorless-0.1.11-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

Download URL: vectorless-0.1.11-cp310-cp310-manylinux_2_34_x86_64.whl
Upload date: Apr 21, 2026
Size: 5.2 MB
Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.12.6

File hashes

Hashes for vectorless-0.1.11-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`6bcd9f9376a9d59612b2471a424431d58b61b38bb3dd810a2473a32e2e87001e`
MD5	`de8c26f645b651f3f5f15dd4ca64d3c8`
BLAKE2b-256	`89a050fbb12d9bab5d00476d0ec97e494f5ba3e8dc093cdab329f05506c8a5f7`

See more details on using hashes here.

vectorless 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Reasoning-based Document Engine

Reason, don't vector · Structure, not chunks · Agents, not embeddings · Exact, not synthesized

Three Rules

Why Vectorless

How It Works

Four-Artifact Index Architecture

Agent-Based Retrieval

Quick Start

Key Features

Supported Documents

Resources

Contributing

Star History

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes