The AI-native file database and memory store. Built for LLM agents to read, search, and remember.
Project description
3 lines to give your AI agent a file database and long-term memory.
Read any file. Search any workspace. Remember everything.
pip install opendb[cli]
opendb index ./my_workspace
opendb serve-mcp
That's it. Your agent now has 7 MCP tools — read any file format, search across documents and code, and store/recall persistent memories. Works with every major agent framework out of the box.
Works with Every Agent Framework
OpenDB speaks MCP — the universal standard supported by all major frameworks. Pick yours:
Claude Code / Cursor / Windsurf
Add to your MCP config (.mcp.json, mcp_servers in settings, etc.):
{
"mcpServers": {
"opendb": {
"command": "opendb",
"args": ["serve-mcp", "--workspace", "/path/to/workspace"]
}
}
}
Claude Agent SDK (Anthropic)
from claude_agent_sdk import query, ClaudeAgentOptions
from claude_agent_sdk.mcp import MCPServerStdio
async with MCPServerStdio("opendb", ["serve-mcp", "--workspace", "./docs"]) as opendb:
options = ClaudeAgentOptions(
model="claude-sonnet-4-6",
mcp_servers={"opendb": opendb},
allowed_tools=["mcp__opendb__*"],
)
async for msg in query(prompt="Summarize the Q4 report", options=options):
print(msg.content)
OpenAI Agents SDK
from agents import Agent, Runner
from agents.mcp import MCPServerStdio
async with MCPServerStdio(name="opendb", params={
"command": "opendb", "args": ["serve-mcp", "--workspace", "./docs"]
}) as opendb:
agent = Agent(name="Analyst", model="gpt-4.1", mcp_servers=[opendb])
result = await Runner.run(agent, "Find all revenue mentions in the PDF reports")
print(result.final_output)
LangChain / LangGraph
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
async with MultiServerMCPClient({
"opendb": {"command": "opendb", "args": ["serve-mcp", "--workspace", "./docs"], "transport": "stdio"}
}) as client:
agent = create_react_agent("anthropic:claude-sonnet-4-6", await client.get_tools())
result = await agent.ainvoke({"messages": [("user", "What changed in the latest spec?")]})
CrewAI
from crewai import Agent, Task, Crew
from crewai.tools import MCPServerStdio
opendb = MCPServerStdio(command="opendb", args=["serve-mcp", "--workspace", "./docs"])
analyst = Agent(role="Document Analyst", goal="Analyze workspace files", mcps=[opendb])
task = Task(description="Summarize all PDF reports in the workspace", agent=analyst)
Crew(agents=[analyst], tasks=[task]).kickoff()
AutoGen (Microsoft)
from autogen_ext.tools.mcp import mcp_server_tools, StdioServerParams
from autogen_agentchat.agents import AssistantAgent
tools = await mcp_server_tools(StdioServerParams(command="opendb", args=["serve-mcp", "--workspace", "./docs"]))
agent = AssistantAgent(name="analyst", model_client=client, tools=tools)
await agent.run("Search for deployment-related memories")
Google ADK
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
agent = LlmAgent(
model="gemini-2.5-flash",
name="analyst",
tools=[McpToolset(connection_params=StdioConnectionParams(command="opendb", args=["serve-mcp", "--workspace", "./docs"]))],
)
Mastra (TypeScript)
import { MCPClient } from "@mastra/mcp";
import { Agent } from "@mastra/core/agent";
const mcp = new MCPClient({
servers: { opendb: { command: "opendb", args: ["serve-mcp", "--workspace", "./docs"] } },
});
const agent = new Agent({
name: "Analyst",
model: "openai/gpt-4.1",
tools: await mcp.listTools(),
});
Python (direct, no framework)
from opendb import OpenDB
db = OpenDB.open("./my_workspace")
await db.init()
await db.index()
text = await db.read("report.pdf", pages="1-3")
results = await db.search("quarterly revenue")
await db.memory_store("User prefers concise answers")
memories = await db.memory_recall("user preferences")
await db.close()
Why OpenDB?
Without OpenDB, agents write inline parsing code for every document:
# Agent writes this every time — 500+ tokens, often fails
run_command("""python -c "
import PyMuPDF; doc = PyMuPDF.open('report.pdf')
for page in doc: print(page.get_text())
" """)
With OpenDB:
read_file("report.pdf") # 50 tokens, always works
Benchmarked across 4 LLMs on 24 document tasks:
| Metric | Without OpenDB | With OpenDB |
|---|---|---|
| Tokens used | 100% | 27-45% (55-73% saved) |
| Task speed | 100% | 36-58% faster |
| Answer quality | 2.4-3.2 / 5 | 3.4-3.9 / 5 |
| Success rate | 79% | 100% |
FTS vs RAG vector retrieval (25-325 documents):
| Scale | FTS Tokens Saved | FTS Quality | RAG Quality |
|---|---|---|---|
| 25 docs | 47% | 3.9/5 | 4.2/5 |
| 125 docs | 44% | 4.7/5 | 4.0/5 |
| 325 docs | 45% | 4.6/5 | 3.5/5 |
FTS quality improves with scale while RAG degrades from distractor noise. See benchmark/REPORT.md for methodology.
MCP Tools
7 tools, auto-discovered by any MCP-compatible agent:
opendb_info — Workspace overview
opendb_info()
→ Workspace: 47 files (ready: 45, processing: 1, failed: 1)
By type: Python (.py) 20 | PDF 12 | Excel (.xlsx) 5 | ...
Recently updated: config.yaml (2 min ago) | main.py (1 hr ago)
opendb_read — Read any file
Code with line numbers, documents as plain text, spreadsheets as structured JSON.
opendb_read(filename="main.py") # Code with line numbers
opendb_read(filename="report.pdf", pages="1-3") # PDF pages
opendb_read(filename="report.pdf", grep="revenue+growth") # Search within file
opendb_read(filename="budget.xlsx", format="json") # Structured spreadsheet
opendb_read(filename="app.py", offset=50, limit=31) # Lines 50-80
opendb_search — Search across code and documents
Regex grep for code, full-text search for documents. Auto-detects mode.
opendb_search(query="def main", path="/workspace", glob="*.py") # Grep code
opendb_search(query="quarterly revenue") # FTS documents
opendb_search(query="TODO", path="/src", case_insensitive=True) # Case insensitive
opendb_glob — Find files
opendb_glob(pattern="**/*.py", path="/workspace")
opendb_glob(pattern="src/**/*.{ts,tsx}", path="/workspace")
opendb_memory_store — Store a memory
opendb_memory_store(content="User prefers dark mode", memory_type="semantic")
opendb_memory_store(content="Deployed v2.1, rollback required", memory_type="episodic", tags=["deploy"])
opendb_memory_store(content="Always run tests before merging", memory_type="procedural")
opendb_memory_store(content="User is a senior engineer at Acme", pinned=true)
Three memory types: semantic (facts/knowledge), episodic (events/outcomes), procedural (workflows/rules).
Set pinned=true for critical facts — they get 10x ranking boost and can be retrieved instantly with pinned_only=true.
opendb_memory_recall — Search memories
Results ranked by relevance × recency. Pinned memories always surface first.
opendb_memory_recall(query="user preferences")
opendb_memory_recall(query="deploy", memory_type="episodic")
opendb_memory_recall(pinned_only=true) # Instant — no search needed, ideal for agent startup
opendb_memory_forget — Delete memories
opendb_memory_forget(memory_id="abc-123-def")
opendb_memory_forget(query="outdated preferences")
Agent Memory
OpenDB doubles as a long-term memory store for AI agents — persistent across sessions, ranked by relevance and recency, with pinned priorities.
Why not Markdown files?
| Markdown files | OpenDB Memory | |
|---|---|---|
| Search | Full-file scan, substring match | FTS5 BM25 index, O(log n) |
| Ranking | None — all matches are equal | Relevance × recency decay |
| Capacity | Claude Code: 200-line hard limit | No hard limit, indexed |
| CJK | Broken (no word segmentation) | jieba tokenization, native CJK |
| Staleness | Old = new, manual cleanup | 0.5^(age/30) auto-decay |
| Structure | Free text + frontmatter | tags[], metadata{}, memory_type, pinned |
| Agent cost | Tokens spent on file management | 3 API calls: store/recall/forget |
Why not vector databases?
FTS quality improves with scale while vector/RAG degrades. Vector similarity retrieves topically-similar noise; FTS retrieves exactly what the agent asked for.
LongMemEval benchmark
Tested against LongMemEval (ICLR 2025) — 470 questions across 6 types:
| OpenDB (FTS5) | MemPalace (ChromaDB) | |
|---|---|---|
| R@5 | 100% (470/470) | 96.6% |
| Embedding model | None (keyword index) | all-MiniLM-L6-v2 |
| API calls | 0 | 0 |
| Median recall latency | 0.9 ms | — |
| Total benchmark time | 32 s | ~5 min |
All 6 question types score 100%. Reproduce: python benchmark/longmemeval_bench.py
Supported Formats
| Format | Extensions | Features |
|---|---|---|
.pdf |
Pages, tables, OCR for scanned docs | |
| Word | .docx |
Page breaks, tables, headings |
| PowerPoint | .pptx |
Slides, speaker notes, tables |
| Excel | .xlsx |
Multiple sheets, structured JSON output |
| CSV | .csv |
Auto-encoding detection, structured JSON |
| Code | .py .js .ts .go .rs .java ... |
Line-numbered output |
| Text | .txt .md .html .json .xml |
Paragraph chunking |
| Images | .png .jpg .tiff .bmp |
OCR (English + Chinese) |
Key Features
- 3-line setup —
pip install,index,serve-mcp— works with every agent framework - 7 MCP tools —
read,search,glob,infofor files +memory_store,memory_recall,memory_forgetfor memory - Agent memory — FTS + time-decay ranking, pinned memories, 100% on LongMemEval; no vector DB needed
- Dual-mode — Embedded (SQLite, zero-config) or Server (PostgreSQL, shared access); same API
- Real-time sync — Directories are watched via OS-native events after indexing
- Full-text search — FTS5 / tsvector with jieba CJK tokenization
- Structured output — Spreadsheets as
{sheets: [{columns, rows}]}for direct analysis - Fuzzy filename resolution — Find files by exact name, partial match, path, or UUID
REST API
OpenDB also exposes a full HTTP API. Run with opendb serve (embedded) or docker-compose up (PostgreSQL).
| Endpoint | Method | Description |
|---|---|---|
/info |
GET |
Workspace statistics |
/read/{filename} |
GET |
Read file (?pages=, ?lines=, ?grep=, ?format=json) |
/search |
POST |
Full-text search or regex grep |
/glob |
GET |
Find files by glob pattern |
/index |
POST |
Index a directory and start watching |
/files |
POST/GET |
Upload or list files |
/memory |
POST/GET |
Store or list memories |
/memory/recall |
POST |
Search memories with ranking |
/memory/forget |
POST |
Delete memories |
/health |
GET |
Health check |
Configuration
Environment variables (FILEDB_ prefix):
| Variable | Default | Description |
|---|---|---|
FILEDB_BACKEND |
postgres |
postgres or sqlite |
FILEDB_DATABASE_URL |
postgresql://... |
PostgreSQL connection |
FILEDB_OCR_ENABLED |
true |
Enable Tesseract OCR |
FILEDB_OCR_LANGUAGES |
eng+chi_sim+chi_tra |
OCR languages |
FILEDB_MAX_FILE_SIZE |
104857600 |
Max file size (100MB) |
FILEDB_INDEX_EXCLUDE_PATTERNS |
[] |
Exclude patterns for indexing |
OPENDB_URL |
http://localhost:8000 |
MCP server → REST API URL |
Development
pip install -e ".[dev]"
pytest
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_db-1.3.0.tar.gz.
File metadata
- Download URL: open_db-1.3.0.tar.gz
- Upload date:
- Size: 96.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d0ee3fce8d8fde1ba0f6e1e07f063223bb987d96555de8c27bbf54b2a472848
|
|
| MD5 |
a503254b0e0fce30bc2fff2b0089f990
|
|
| BLAKE2b-256 |
bdfe352814bea5ca18920ae3aaa11c61987c7229790457bee09ac00ca1a90223
|
File details
Details for the file open_db-1.3.0-py3-none-any.whl.
File metadata
- Download URL: open_db-1.3.0-py3-none-any.whl
- Upload date:
- Size: 97.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cd4d6061849cbc06ebe0ee1686bded74d2f927ee86a54581447b173a4015596
|
|
| MD5 |
4487a2dced74898a092fcf2303ee6628
|
|
| BLAKE2b-256 |
7d8ab9750258b7c9f927da80f423039f2282a3db82e9b1cee837e22ebe168ddd
|