Vectora - Advanced AI Assistant with RAG and MCP capabilities
Project description
Vectora
Vectora is an open-source AI assistant (Apache 2.0) built for developers — local-first, self-hosted, and designed to run as a powerful sub-agent inside any MCP-compatible orchestrator (Claude Code, Claude Desktop, Paperclip, VS Code extensions).
At its core, Vectora solves the knowledge gap problem: LLMs don't know your codebase, your docs, or the latest versions of your stack. Vectora bridges that gap with RAG (Retrieval-Augmented Generation) — ingest your docs once, and every AI interaction becomes contextually aware.
Why Vectora?
- Supervisor + Specialized Agents: A router classifies every message and delegates to the right specialist — search agent for web/RAG, coder agent for files and terminal, direct agent for conversation and synthesis.
- RAG-native subgraph: Every query goes through a full retrieve → score → rerank → inject pipeline before hitting the LLM.
- 14 tools across 4 categories: Web search, vector search, file system, memory — each agent sees only the tools it needs.
- Cascading embeddings: Web search results are automatically queued for embedding into LanceDB (fire-and-forget), building your knowledge base as you chat.
- Sub-agent architecture: Runs as an MCP server. Claude Code delegates complex tasks to Vectora; Vectora reasons, routes, and responds.
- Persistent memory: Cross-session memory in SQLite. Vectora remembers your preferences, project context, and decisions.
- Zero infra: SQLite + LanceDB. No Docker required for local use.
- Multi-LLM: Google Gemini (free tier), Cohere (free tier), OpenAI, Anthropic, or Ollama (fully local).
Architecture
Supervisor + Workers
Every message enters through a single entry point and is routed by the Supervisor to the right specialized agent:
START
└─► supervisor (classify intent)
├─► direct ──► direct_tools (memory) ──► direct ──► END
├─► search ──► search_tools ──► process_retrieval ──► search ──► END
├─► coder ──► coder_tools (fs + memory) ──► coder ──► END
└─► rag_subgraph ──────────────────────────────────────► direct ──► END
| Agent | Responsibility | Tools |
|---|---|---|
| supervisor | Classifies intent via regex + LLM fallback, routes via Command(goto=...) |
— |
| direct | General conversation, synthesis after RAG, memory management | save_memory, get_memory, delete_memory |
| search | Web research, real-time info, builds knowledge base via cascading embeddings | web_search, fetch_url, vector_search |
| coder | File operations, terminal commands, code generation | file_read, file_edit, file_write, grep, list_dir, terminal |
RAG Subgraph
When the supervisor routes to rag, a dedicated subgraph runs the full retrieval pipeline before synthesis:
rag_retrieve (vector_search)
└─► rag_decide (score threshold)
├─► rag_inject (score ≥ 0.7 — high confidence, inject directly)
├─► rag_rerank (score 0.4–0.7 — rerank with Cohere before inject)
└─► rag_websearch (score < 0.4 — fall back to web + auto-embed results)
Results are injected as a SystemMessage into context before the direct agent synthesizes the final answer.
Cascading Embeddings
After any web_search or fetch_url call, process_retrieval automatically queues the results for embedding into LanceDB — fire-and-forget, no blocking. Your vector store grows passively as you use web search.
Prerequisites
Cohere — Required
Vectora uses Cohere for embeddings (embed-multilingual-v3.0) and reranking (rerank-multilingual-v3.0). It offers a generous free tier with first-class LangChain integration.
Get your key: https://dashboard.cohere.com/api-keys
Tavily — Required
Vectora uses Tavily for real-time web search and URL content extraction. It offers a generous free tier optimized for AI agents.
Get your key: https://app.tavily.com/
LLM Provider — Choose One
| Provider | Free Tier | Get Key |
|---|---|---|
| Google Gemini ✅ Recommended | Yes | aistudio.google.com |
| Cohere | Yes | dashboard.cohere.com |
| Ollama (local) | No cost | ollama.ai |
| OpenAI | Paid | platform.openai.com |
| Anthropic | Paid | console.anthropic.com |
Installation
Option 1: UV (Recommended)
# Install globally
uv tool install vectora-agent
# First-time setup (interactive wizard)
vectora setup
# Start chatting
vectora chat
Option 2: From Source
git clone https://github.com/brunosrz/vectora.git
cd vectora
# Install with all dependencies
uv sync
# Configure your keys
cp .env.example .env
# Edit .env with your GOOGLE_API_KEY and COHERE_API_KEY
# Run
uv run vectora chat
Option 3: Docker
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys
# Run the chat interface
docker compose run --rm vectora
# Or run as MCP server (multi-agent mode)
MCP_TRANSPORT=sse docker compose up -d
Running Modes
Chat Mode (Interactive TUI)
The primary interface — a terminal dashboard built with Rich.
vectora chat
Features: multi-turn conversation, session history, live tool feedback (colored panels), debug mode toggle, model switching.
MCP Server — Local (stdio)
Run Vectora as an MCP sub-agent for Claude Code or Claude Desktop.
vectora mcp-server
MCP Server — Remote (SSE, Multi-Agent)
Run Vectora as a shared hub for multiple Paperclip agents or orchestrators connecting simultaneously.
MCP_TRANSPORT=sse MCP_PORT=8000 vectora mcp-server
Each client passes its own thread_id — sessions are fully isolated.
Setup Wizard
Interactive configuration to set up API keys, choose LLM provider, and test connectivity.
vectora setup
Connecting to Claude Code / Claude Desktop
Add Vectora to your .mcp.json (in your project root):
{
"mcpServers": {
"Vectora-MCP": {
"command": "uv",
"args": ["run", "--project", "/absolute/path/to/vectora", "vectora-mcp"]
}
}
}
For a globally installed Vectora:
{
"mcpServers": {
"Vectora-MCP": {
"command": "vectora-mcp"
}
}
}
For Docker (SSE mode, multiple agents):
{
"mcpServers": {
"Vectora-MCP": {
"url": "http://localhost:8000/sse"
}
}
}
Chat Commands
| Command | Description |
|---|---|
/help |
Show quick help |
/list |
Show all commands |
/tools |
List available tools |
/model |
List or switch models |
/debug |
Toggle debug mode (shows tool calls and routing decisions) |
/new |
Start a new session |
/sessions |
List all sessions |
/session <id> |
Switch to a specific session |
/quit |
Exit |
Input shortcuts: Enter sends, Alt+Enter or Shift+Enter adds a line break.
Tools Reference
14 tools across 4 categories, distributed to the agent that needs them:
| Category | Tools | Agent |
|---|---|---|
| Web | web_search, fetch_url |
search |
| RAG | vector_search, embedding, ingest_docs |
search / RAG subgraph |
| Files | file_read, file_edit, file_write, grep, list_dir, terminal |
coder |
| Memory | save_memory, get_memory, delete_memory |
direct / coder |
| MCP | call_mcp_tool |
all |
Data & Persistence
All data is stored locally in ~/.vectora/:
~/.vectora/
├── .env # Your API keys
├── chat_config.json # Persistent chat settings
├── data/
│ ├── vectora.db # Sessions, memories, checkpoints (SQLite)
│ ├── embedding_queue.db # Async embedding queue (SQLite)
│ └── lancedb/ # Vector store for RAG
├── logs/
│ ├── vectora.jsonl # Structured JSON logs (rotating, 10 MB)
│ └── mcp.log # MCP server logs
└── exports/ # Session audit trails + debug dumps
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.14+ managed by uv |
| Agent Framework | LangChain + LangGraph |
| Agent Pattern | Supervisor + Specialized Workers (direct / search / coder) + RAG Subgraph |
| Vector Store | LanceDB — file-based, zero-config |
| Embeddings | Cohere — embed-multilingual-v3.0 + rerank-multilingual-v3.0 |
| Persistence | SQLite via aiosqlite + LangGraph Checkpointer |
| Context Protocol | MCP via FastMCP |
| Terminal UI | Rich + prompt-toolkit |
| Observability | LangSmith (optional) |
Configuration
All configuration goes in ~/.vectora/.env or a project-local .env:
# LLM Provider
LLM_PROVIDER=google-genai
GOOGLE_API_KEY=your_key_here
# Required: RAG embeddings + reranking
COHERE_API_KEY=your_key_here
# Required: Web search + URL extraction
TAVILY_API_KEY=your_key_here
# Optional: Tracing
LANGSMITH_TRACING=false
LANGSMITH_API_KEY=your_key_here
LANGSMITH_PROJECT=vectora
# Optional: Logging
LOG_LEVEL=INFO
# Feature flags
ENABLE_RAG=true
ENABLE_FILE_OPERATIONS=true
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectora_agent-0.1.0rc1.tar.gz.
File metadata
- Download URL: vectora_agent-0.1.0rc1.tar.gz
- Upload date:
- Size: 356.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9290b8941c3d3643c5e5e75256af744584bd27f654dfaa556f5146be31d5800
|
|
| MD5 |
252442f24b1ce0643da6b60d68401683
|
|
| BLAKE2b-256 |
5367892dadeb7df97e3dd4616ce7d9425504dc81d660cb9b03971d6e97221d41
|
Provenance
The following attestation bundles were made for vectora_agent-0.1.0rc1.tar.gz:
Publisher:
runner.yml on brunosrz/vectora
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vectora_agent-0.1.0rc1.tar.gz -
Subject digest:
b9290b8941c3d3643c5e5e75256af744584bd27f654dfaa556f5146be31d5800 - Sigstore transparency entry: 1588994027
- Sigstore integration time:
-
Permalink:
brunosrz/vectora@5f35a3b826da9edd55c43b820148ecd385cf4462 -
Branch / Tag:
refs/tags/v0.1.0rc1 - Owner: https://github.com/brunosrz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
runner.yml@5f35a3b826da9edd55c43b820148ecd385cf4462 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vectora_agent-0.1.0rc1-py3-none-any.whl.
File metadata
- Download URL: vectora_agent-0.1.0rc1-py3-none-any.whl
- Upload date:
- Size: 161.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecbf92e6b07a54b40ba9d2a6848c894f81ae1c20d18f7c1a4c75739a31509dc2
|
|
| MD5 |
6d0667ee7e11995f154ad858677e538a
|
|
| BLAKE2b-256 |
a29a01288fc48c7f8652d29109d086916706bad62518abaf3a4021efa53983d2
|
Provenance
The following attestation bundles were made for vectora_agent-0.1.0rc1-py3-none-any.whl:
Publisher:
runner.yml on brunosrz/vectora
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vectora_agent-0.1.0rc1-py3-none-any.whl -
Subject digest:
ecbf92e6b07a54b40ba9d2a6848c894f81ae1c20d18f7c1a4c75739a31509dc2 - Sigstore transparency entry: 1588994058
- Sigstore integration time:
-
Permalink:
brunosrz/vectora@5f35a3b826da9edd55c43b820148ecd385cf4462 -
Branch / Tag:
refs/tags/v0.1.0rc1 - Owner: https://github.com/brunosrz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
runner.yml@5f35a3b826da9edd55c43b820148ecd385cf4462 -
Trigger Event:
push
-
Statement type: