Lightning-fast RAG for AI agents. ONNX-powered, 4-layer fusion, MCP server. No PyTorch.

These details have not been verified by PyPI

Project links

Project description

🦖 VelociRAG

Lightning-fast RAG for AI agents.

Four-layer retrieval fusion powered by ONNX Runtime. 54MB install, sub-200ms warm search, no PyTorch. MCP-ready.

Most RAG solutions either drag in 750MB of PyTorch or limit you to single-layer vector search. VelociRAG gives you four retrieval methods — vector similarity, BM25 keyword matching, knowledge graph traversal, and metadata filtering — fused through reciprocal rank fusion with optional cross-encoder reranking. All running on ONNX Runtime, no GPU, no API keys. Comes with an MCP server for agent integration, a Unix socket daemon for warm queries, and a CLI that just works.

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

pip install "velocirag[mcp]"
velocirag index ./my-docs --graph --metadata
velocirag mcp

Claude Code — add to .mcp.json in your project root:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp"],
      "env": { "VELOCIRAG_DB": "/path/to/data" }
    }
  }
}

Then open /mcp in Claude Code and enable the velocirag server. If using a virtualenv, use the full path to the binary (e.g. .venv/bin/velocirag).

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Python API

from velocirag import Embedder, VectorStore, Searcher
embedder = Embedder()
store = VectorStore('./my-db', embedder)
store.add_directory('./my-docs')
searcher = Searcher(store, embedder)
results = searcher.search('query', limit=5)

CLI

pip install velocirag
velocirag index ./my-docs --graph --metadata
velocirag search "your query here"

Search Daemon (warm engine for CLI users)

velocirag serve --db ./my-data        # start daemon (background)
velocirag search "query"              # auto-routes through daemon
velocirag status                      # check daemon health
velocirag stop                        # stop daemon

The daemon keeps the ONNX model + FAISS index warm over a Unix socket. First query loads the engine (~1s), subsequent queries return in ~180ms with full 4-layer fusion.

🎯 Why VelociRAG?

	VelociRAG	LangChain	LlamaIndex	Chroma	mcp-local-rag
Search layers	4	2	2	1	2
Cross-encoder reranking	✅	❌	✅	❌	❌
Knowledge graph	✅	❌	✅	❌	❌
LLM required for search	❌	⚠️	⚠️	❌	❌
MCP server	✅	❌	❌	❌	✅
GPU required	❌	❌	❌	❌	❌
PyTorch required	❌	✅	✅	❌	❌
Install size	~54MB	~750MB+	~750MB+	~50MB	~30MB
Warm search latency	~3ms	—	—	~50ms	~200ms

🏗️ How It Works

The 4-layer pipeline:

Query → expand (acronyms, variants)
      → [Vector]   FAISS cosine similarity (384d, MiniLM-L6-v2 via ONNX)
      → [Keyword]  BM25 via SQLite FTS5
      → [Graph]    Knowledge graph traversal
      → [Metadata] Structured SQL filters (tags, status, project)
      → RRF Fusion → Cross-encoder rerank → Results

What each layer catches:

Query type	Vector	Keyword	Graph	Metadata
Conceptual ("improve error handling")	✅	—	—	—
Exact match ("ERR_CONNECTION_REFUSED")	—	✅	—	—
Connected concepts	—	—	✅	—
Filtered ("#python status:active")	—	—	—	✅
Combined ("React state management")	✅	✅	✅	✅

✨ Features

ONNX Runtime — 184ms cold start, 3ms cached. 54MB install — no PyTorch, no GPU
Four-layer fusion — FAISS vector similarity + SQLite FTS5 (BM25) + knowledge graph + metadata filtering, merged via reciprocal rank fusion
Cross-encoder reranking — Optional TinyBERT reranker with score blending (pip install velocirag[reranker])
MCP server — Five tools (search, index, add_document, health, list_sources) for Claude, Cursor, Windsurf
Search daemon — Unix socket server keeps ONNX model + FAISS index warm between queries
Knowledge graph — Seven analyzers build entity, temporal, topic, and explicit-link edges from markdown. Optional GLiNER NER. 680 files in 3.8s
Smart chunking — Header-aware splitting preserves document structure and parent context
Query expansion — Acronym registry, casing/spacing variants, underscore-aware tokenization
Runs anywhere — CPU-only, 8GB RAM, no API keys, no external services

🤖 MCP Server

VelociRAG exposes a Model Context Protocol server for seamless agent integration:

Available tools:

search — 4-layer fusion search with reranking
index — Add documents to the knowledge base
add_document — Insert single document
health — System diagnostics
list_sources — Show indexed document sources

The MCP server process stays alive between queries, so models load once and every subsequent search is warm. Works with any MCP-compatible client.

🐍 Python API

Full 4-layer unified search:

from velocirag import (
    Embedder, VectorStore, Searcher,
    GraphStore, MetadataStore, UnifiedSearch,
    GraphPipeline
)

# Build the full stack
embedder = Embedder()
store = VectorStore('./search-db', embedder)
graph_store = GraphStore('./search-db/graph.db')
metadata_store = MetadataStore('./search-db/metadata.db')

# Index with graph + metadata
store.add_directory('./docs')
pipeline = GraphPipeline(graph_store, embedder, metadata_store)
pipeline.build('./docs')

# Unified search across all layers
searcher = Searcher(store, embedder)
unified = UnifiedSearch(searcher, graph_store, metadata_store)
results = unified.search(
    'machine learning algorithms',
    limit=5,
    enrich_graph=True,
    filters={'tags': ['python'], 'status': 'active'}
)

Quick semantic search:

from velocirag import Embedder, VectorStore, Searcher

embedder = Embedder()
store = VectorStore('./db', embedder)
store.add_directory('./docs')
searcher = Searcher(store, embedder)
results = searcher.search('neural networks', limit=10)

💻 CLI Reference

# Index documents with all layers
velocirag index <path> [--graph] [--metadata] [--gliner] [--light-graph] [--force]

# Search across all layers (auto-routes through daemon if running)
velocirag search <query> [--limit N] [--threshold F] [--format text|json]

# Search daemon
velocirag serve [--db PATH] [-f]         # start daemon (-f for foreground)
velocirag stop                            # stop daemon
velocirag status                          # check daemon health

# Metadata queries
velocirag query [--tags TAG] [--status S] [--project P] [--recent N]

# System health and status
velocirag health [--format text|json]

# Start MCP server
velocirag mcp [--db PATH] [--transport stdio|sse]

📊 Performance

Real benchmarks from production deployment (3,416 documents, ONNX Runtime, v0.5.0):

Metric	Value
Embedding (warm)	3ms
Embedding (cold)	184ms
Full 4-layer search (warm)	76–350ms
Graph build (680 files, --light-graph)	3.8s
Graph build (7K files, --light-graph)	~90s (no OOM on 8GB)
Hit rate (100-query benchmark)	99/100
Install size	~54MB (no PyTorch)
RAM usage	<1GB with ONNX models
Graph	4,837 nodes, 11,443 edges

⚙️ Configuration

Environment Variable	Default	Description
`VELOCIRAG_DB`	`./.velocirag`	Database directory
`VELOCIRAG_SOCKET`	`/tmp/velocirag-daemon.sock`	Daemon socket path
`NO_COLOR`	—	Disable colored output

Dependencies:

Base: onnxruntime, tokenizers, huggingface-hub, faiss-cpu, numpy, click
Reranker: pip install velocirag[reranker] (adds sentence-transformers)
MCP: pip install velocirag[mcp] (adds fastmcp)
NER: pip install velocirag[ner] (adds GLiNER)
Graph: pip install velocirag[graph] (adds networkx, scikit-learn)

📄 License

MIT — Use it anywhere, build anything.

Need agent integration help? Check AGENTS.md for machine-readable project context.

Built for agents who think fast and remember faster.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.4

Apr 5, 2026

0.7.3

Apr 5, 2026

0.7.2

Mar 30, 2026

0.7.1

Mar 28, 2026

0.7.0

Mar 28, 2026

0.6.4

Mar 28, 2026

0.6.3

Mar 28, 2026

0.6.2

Mar 28, 2026

0.6.1

Mar 27, 2026

0.6.0

Mar 27, 2026

0.5.5 yanked

Mar 27, 2026

0.5.4

Mar 27, 2026

0.5.3

Mar 27, 2026

0.5.2

Mar 27, 2026

This version

0.5.1

Mar 27, 2026

0.5.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

velocirag-0.5.1.tar.gz (168.8 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

velocirag-0.5.1-py3-none-any.whl (102.7 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file velocirag-0.5.1.tar.gz.

File metadata

Download URL: velocirag-0.5.1.tar.gz
Upload date: Mar 27, 2026
Size: 168.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for velocirag-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`c3d5ecfefb1e1b6c96098d72cb3b9b9c388696c12cc21e91ab1f011677d25c43`
MD5	`8880ecc97d00a5a6fe495c7461525428`
BLAKE2b-256	`21d7cbef7e3a4c4d9d8be1fcf25914dfe0c1591a78bfe4622e843c1c548d560c`

See more details on using hashes here.

File details

Details for the file velocirag-0.5.1-py3-none-any.whl.

File metadata

Download URL: velocirag-0.5.1-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 102.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for velocirag-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`caa1f6c20c2c6b3c64fed1d2cce2525b443535db13bf3ebecde4a3b3a347a9cd`
MD5	`e6a40fda4b60fe3cef1181105f13db26`
BLAKE2b-256	`32156c0149236451cad48c872083d086ee38d2166dea144e1fe829e8635dbf64`

See more details on using hashes here.

velocirag 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦖 VelociRAG

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

Python API

CLI

Search Daemon (warm engine for CLI users)

🎯 Why VelociRAG?

🏗️ How It Works

✨ Features

🤖 MCP Server

🐍 Python API

💻 CLI Reference

📊 Performance

⚙️ Configuration

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes