open-db

The AI-native file database and memory store. Built for LLM agents to read, search, and remember.

These details have not been verified by PyPI

Project links

Project description

3 lines to give your AI agent a file database and long-term memory.
Read any file. Search any workspace. Remember everything.

pip install opendb[cli]
opendb index ./my_workspace
opendb serve-mcp

That's it. Your agent now has 7 MCP tools — read any file format, search across documents and code, and store/recall persistent memories. Works with every major agent framework out of the box.

Works with Every Agent Framework

OpenDB speaks MCP — the universal standard supported by all major frameworks. Pick yours:

Claude Code / Cursor / Windsurf

Add to your MCP config (.mcp.json, mcp_servers in settings, etc.):

{
  "mcpServers": {
    "opendb": {
      "command": "opendb",
      "args": ["serve-mcp", "--workspace", "/path/to/workspace"]
    }
  }
}

Claude Agent SDK (Anthropic)

from claude_agent_sdk import query, ClaudeAgentOptions
from claude_agent_sdk.mcp import MCPServerStdio

async with MCPServerStdio("opendb", ["serve-mcp", "--workspace", "./docs"]) as opendb:
    options = ClaudeAgentOptions(
        model="claude-sonnet-4-6",
        mcp_servers={"opendb": opendb},
        allowed_tools=["mcp__opendb__*"],
    )
    async for msg in query(prompt="Summarize the Q4 report", options=options):
        print(msg.content)

OpenAI Agents SDK

from agents import Agent, Runner
from agents.mcp import MCPServerStdio

async with MCPServerStdio(name="opendb", params={
    "command": "opendb", "args": ["serve-mcp", "--workspace", "./docs"]
}) as opendb:
    agent = Agent(name="Analyst", model="gpt-4.1", mcp_servers=[opendb])
    result = await Runner.run(agent, "Find all revenue mentions in the PDF reports")
    print(result.final_output)

LangChain / LangGraph

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent

async with MultiServerMCPClient({
    "opendb": {"command": "opendb", "args": ["serve-mcp", "--workspace", "./docs"], "transport": "stdio"}
}) as client:
    agent = create_react_agent("anthropic:claude-sonnet-4-6", await client.get_tools())
    result = await agent.ainvoke({"messages": [("user", "What changed in the latest spec?")]})

CrewAI

from crewai import Agent, Task, Crew
from crewai.tools import MCPServerStdio

opendb = MCPServerStdio(command="opendb", args=["serve-mcp", "--workspace", "./docs"])

analyst = Agent(role="Document Analyst", goal="Analyze workspace files", mcps=[opendb])
task = Task(description="Summarize all PDF reports in the workspace", agent=analyst)
Crew(agents=[analyst], tasks=[task]).kickoff()

AutoGen (Microsoft)

from autogen_ext.tools.mcp import mcp_server_tools, StdioServerParams
from autogen_agentchat.agents import AssistantAgent

tools = await mcp_server_tools(StdioServerParams(command="opendb", args=["serve-mcp", "--workspace", "./docs"]))
agent = AssistantAgent(name="analyst", model_client=client, tools=tools)
await agent.run("Search for deployment-related memories")

Google ADK

from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams

agent = LlmAgent(
    model="gemini-2.5-flash",
    name="analyst",
    tools=[McpToolset(connection_params=StdioConnectionParams(command="opendb", args=["serve-mcp", "--workspace", "./docs"]))],
)

Mastra (TypeScript)

import { MCPClient } from "@mastra/mcp";
import { Agent } from "@mastra/core/agent";

const mcp = new MCPClient({
  servers: { opendb: { command: "opendb", args: ["serve-mcp", "--workspace", "./docs"] } },
});

const agent = new Agent({
  name: "Analyst",
  model: "openai/gpt-4.1",
  tools: await mcp.listTools(),
});

Python (direct, no framework)

from opendb import OpenDB

db = OpenDB.open("./my_workspace")
await db.init()
await db.index()

text    = await db.read("report.pdf", pages="1-3")
results = await db.search("quarterly revenue")
await db.memory_store("User prefers concise answers")
memories = await db.memory_recall("user preferences")

await db.close()

Why OpenDB?

Without OpenDB, agents write inline parsing code for every document:

# Agent writes this every time — 500+ tokens, often fails
run_command("""python -c "
import PyMuPDF; doc = PyMuPDF.open('report.pdf')
for page in doc: print(page.get_text())
" """)

With OpenDB:

read_file("report.pdf")  # 50 tokens, always works

Benchmarked across 4 LLMs on 24 document tasks:

Metric	Without OpenDB	With OpenDB
Tokens used	100%	27-45% (55-73% saved)
Task speed	100%	36-58% faster
Answer quality	2.4-3.2 / 5	3.4-3.9 / 5
Success rate	79%	100%

FTS vs RAG vector retrieval (25-325 documents):

Scale	FTS Tokens Saved	FTS Quality	RAG Quality
25 docs	47%	3.9/5	4.2/5
125 docs	44%	4.7/5	4.0/5
325 docs	45%	4.6/5	3.5/5

FTS quality improves with scale while RAG degrades from distractor noise. See benchmark/REPORT.md for methodology.

MCP Tools

7 tools, auto-discovered by any MCP-compatible agent:

`opendb_info` — Workspace overview

opendb_info()
→ Workspace: 47 files (ready: 45, processing: 1, failed: 1)
  By type:  Python (.py) 20 | PDF 12 | Excel (.xlsx) 5 | ...
  Recently updated:  config.yaml (2 min ago) | main.py (1 hr ago)

`opendb_read` — Read any file

Code with line numbers, documents as plain text, spreadsheets as structured JSON.

opendb_read(filename="main.py")                            # Code with line numbers
opendb_read(filename="report.pdf", pages="1-3")            # PDF pages
opendb_read(filename="report.pdf", grep="revenue+growth")  # Search within file
opendb_read(filename="budget.xlsx", format="json")          # Structured spreadsheet
opendb_read(filename="app.py", offset=50, limit=31)         # Lines 50-80

`opendb_search` — Search across code and documents

Regex grep for code, full-text search for documents. Auto-detects mode.

opendb_search(query="def main", path="/workspace", glob="*.py")   # Grep code
opendb_search(query="quarterly revenue")                           # FTS documents
opendb_search(query="TODO", path="/src", case_insensitive=True)    # Case insensitive

`opendb_glob` — Find files

opendb_glob(pattern="**/*.py", path="/workspace")
opendb_glob(pattern="src/**/*.{ts,tsx}", path="/workspace")

`opendb_memory_store` — Store a memory

opendb_memory_store(content="User prefers dark mode", memory_type="semantic")
opendb_memory_store(content="Deployed v2.1, rollback required", memory_type="episodic", tags=["deploy"])
opendb_memory_store(content="Always run tests before merging", memory_type="procedural")
opendb_memory_store(content="User is a senior engineer at Acme", pinned=true)

Three memory types: semantic (facts/knowledge), episodic (events/outcomes), procedural (workflows/rules).

Set pinned=true for critical facts — they get 10x ranking boost and can be retrieved instantly with pinned_only=true.

`opendb_memory_recall` — Search memories

Results ranked by relevance × recency. Pinned memories always surface first.

opendb_memory_recall(query="user preferences")
opendb_memory_recall(query="deploy", memory_type="episodic")
opendb_memory_recall(pinned_only=true)   # Instant — no search needed, ideal for agent startup

`opendb_memory_forget` — Delete memories

opendb_memory_forget(memory_id="abc-123-def")
opendb_memory_forget(query="outdated preferences")

Agent Memory

OpenDB doubles as a long-term memory store for AI agents — persistent across sessions, ranked by relevance and recency, with pinned priorities.

Why not Markdown files?

	Markdown files	OpenDB Memory
Search	Full-file scan, substring match	FTS5 BM25 index, O(log n)
Ranking	None — all matches are equal	Relevance × recency decay
Capacity	Claude Code: 200-line hard limit	No hard limit, indexed
CJK	Broken (no word segmentation)	jieba tokenization, native CJK
Staleness	Old = new, manual cleanup	`0.5^(age/30)` auto-decay
Structure	Free text + frontmatter	tags[], metadata{}, memory_type, pinned
Agent cost	Tokens spent on file management	3 API calls: store/recall/forget

Why not vector databases?

FTS quality improves with scale while vector/RAG degrades. Vector similarity retrieves topically-similar noise; FTS retrieves exactly what the agent asked for.

LongMemEval benchmark

Tested against LongMemEval (ICLR 2025) — 470 questions across 6 types:

	OpenDB (FTS5)	MemPalace (ChromaDB)
R@5	100% (470/470)	96.6%
Embedding model	None (keyword index)	all-MiniLM-L6-v2
API calls	0	0
Median recall latency	0.9 ms	—
Total benchmark time	32 s	~5 min

All 6 question types score 100%. Reproduce: python benchmark/longmemeval_bench.py

Supported Formats

Format	Extensions	Features
PDF	`.pdf`	Pages, tables, OCR for scanned docs
Word	`.docx`	Page breaks, tables, headings
PowerPoint	`.pptx`	Slides, speaker notes, tables
Excel	`.xlsx`	Multiple sheets, structured JSON output
CSV	`.csv`	Auto-encoding detection, structured JSON
Code	`.py` `.js` `.ts` `.go` `.rs` `.java` ...	Line-numbered output
Text	`.txt` `.md` `.html` `.json` `.xml`	Paragraph chunking
Images	`.png` `.jpg` `.tiff` `.bmp`	OCR (English + Chinese)

Key Features

3-line setup — pip install, index, serve-mcp — works with every agent framework
7 MCP tools — read, search, glob, info for files + memory_store, memory_recall, memory_forget for memory
Agent memory — FTS + time-decay ranking, pinned memories, 100% on LongMemEval; no vector DB needed
Dual-mode — Embedded (SQLite, zero-config) or Server (PostgreSQL, shared access); same API
Real-time sync — Directories are watched via OS-native events after indexing
Full-text search — FTS5 / tsvector with jieba CJK tokenization
Structured output — Spreadsheets as {sheets: [{columns, rows}]} for direct analysis
Fuzzy filename resolution — Find files by exact name, partial match, path, or UUID

REST API

OpenDB also exposes a full HTTP API. Run with opendb serve (embedded) or docker-compose up (PostgreSQL).

Endpoint	Method	Description
`/info`	`GET`	Workspace statistics
`/read/{filename}`	`GET`	Read file (`?pages=`, `?lines=`, `?grep=`, `?format=json`)
`/search`	`POST`	Full-text search or regex grep
`/glob`	`GET`	Find files by glob pattern
`/index`	`POST`	Index a directory and start watching
`/files`	`POST`/`GET`	Upload or list files
`/memory`	`POST`/`GET`	Store or list memories
`/memory/recall`	`POST`	Search memories with ranking
`/memory/forget`	`POST`	Delete memories
`/health`	`GET`	Health check

Configuration

Environment variables (FILEDB_ prefix):

Variable	Default	Description
`FILEDB_BACKEND`	`postgres`	`postgres` or `sqlite`
`FILEDB_DATABASE_URL`	`postgresql://...`	PostgreSQL connection
`FILEDB_OCR_ENABLED`	`true`	Enable Tesseract OCR
`FILEDB_OCR_LANGUAGES`	`eng+chi_sim+chi_tra`	OCR languages
`FILEDB_MAX_FILE_SIZE`	`104857600`	Max file size (100MB)
`FILEDB_INDEX_EXCLUDE_PATTERNS`	`[]`	Exclude patterns for indexing
`OPENDB_URL`	`http://localhost:8000`	MCP server → REST API URL

Development

pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.0

Apr 11, 2026

1.4.0

Apr 9, 2026

This version

1.3.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_db-1.3.0.tar.gz (96.7 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

open_db-1.3.0-py3-none-any.whl (97.6 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file open_db-1.3.0.tar.gz.

File metadata

Download URL: open_db-1.3.0.tar.gz
Upload date: Apr 8, 2026
Size: 96.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for open_db-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`3d0ee3fce8d8fde1ba0f6e1e07f063223bb987d96555de8c27bbf54b2a472848`
MD5	`a503254b0e0fce30bc2fff2b0089f990`
BLAKE2b-256	`bdfe352814bea5ca18920ae3aaa11c61987c7229790457bee09ac00ca1a90223`

See more details on using hashes here.

File details

Details for the file open_db-1.3.0-py3-none-any.whl.

File metadata

Download URL: open_db-1.3.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 97.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for open_db-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6cd4d6061849cbc06ebe0ee1686bded74d2f927ee86a54581447b173a4015596`
MD5	`4487a2dced74898a092fcf2303ee6628`
BLAKE2b-256	`7d8ab9750258b7c9f927da80f423039f2282a3db82e9b1cee837e22ebe168ddd`

See more details on using hashes here.

open-db 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Works with Every Agent Framework

Why OpenDB?

MCP Tools

opendb_info — Workspace overview

opendb_read — Read any file

opendb_search — Search across code and documents

opendb_glob — Find files

opendb_memory_store — Store a memory

opendb_memory_recall — Search memories

opendb_memory_forget — Delete memories

Agent Memory

Why not Markdown files?

Why not vector databases?

LongMemEval benchmark

Supported Formats

Key Features

REST API

Configuration

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`opendb_info` — Workspace overview

`opendb_read` — Read any file

`opendb_search` — Search across code and documents

`opendb_glob` — Find files

`opendb_memory_store` — Store a memory

`opendb_memory_recall` — Search memories

`opendb_memory_forget` — Delete memories