AI Memory Operating System — Graph-RAG, temporal truth maintenance, actionable schemas, selective encryption, sub-200ms hybrid retrieval.
Project description
OMem
The Memory Operating System for AI Agents
Persistent · Intelligent · Blazing Fast
Give your AI the memory it deserves — one that learns, forgets, and thinks.
Quick Start · Benchmarks · MCP / Claude Desktop · CLI · Docs
The Problem with AI Memory Today
Your agent is brilliant in the moment — but the second the conversation ends, it's gone. You've tried:
- 🗃 Vector databases — Dumb storage. No lifecycle. No importance. Returns noise.
- 📜 Long context windows — Expensive. Slow. Hits limits. Drowns your agent in irrelevant history.
- 💾 Conversation buffers — Grows forever. Can't handle multi-session continuity.
None of these are memory systems. They're storage systems.
OMem is Different
OMem is a Memory Operating System — a complete cognitive layer that mirrors how intelligent systems actually remember:
Store everything → Classify what matters → Retrieve what's relevant
Compress noise → Forget the useless → Resolve contradictions
It's not a database with a retrieval wrapper. It's a brain.
Benchmarks
Tested on Apple M-series. Dataset: 5,000 memories, 500 queries,
all-MiniLM-L6-v2embedding model — shared identically across all systems for a fair comparison.
⚡ Head-to-Head Performance
| System | Setup | Add (ops/s) | RAG (ops/s) | RAG p99 |
|---|---|---|---|---|
| OMem | 4.0 ms | 65 † | 292 | 20 ms |
| ChromaDB | 507 ms | 277 ‡ | 280 | 4 ms |
| LanceDB | 8 ms | 82,000 ‡ | 182 | 7 ms |
| Mem0 | 15,000+ ms | < 1 | 18 | 638 ms |
† Smart Ingestion — OMem's
add()performs:embed → auto-classify → dedup-check → entity-graph sync → async persist. ChromaDB and LanceDB store pre-computed vectors only. We do the heavy lifting so your agent doesn't have to.‡ Raw storage — No classification, no deduplication, no graph linking.
🏆 Why OMem Wins Where It Counts
| Metric | OMem vs Mem0 | OMem vs ChromaDB | OMem vs LanceDB |
|---|---|---|---|
| RAG throughput | 16× faster | 1.0× (parity) | 1.6× faster |
| p50 recall | 0.007 ms | 3.5 ms | 5.3 ms |
| Setup time | 125× faster | 127× faster | parity |
| Smart features | ✅ All 9 | ❌ 0/9 | ❌ 0/9 |
The critical insight: Mem0 is 16× slower because it runs an LLM extraction pipeline on every add. OMem replaces that with a Rust-native classification engine — zero LLM calls, zero API costs, zero network latency.
🧩 Feature Matrix
| Feature | OMem | ChromaDB | Mem0 | LanceDB |
|---|---|---|---|---|
| Auto-Classification | ✅ | ❌ | ❌ | ❌ |
| Causal Graphs | ✅ | ❌ | ❌ | ❌ |
| Hybrid RAG (vector + keyword + recency + importance) | ✅ | ❌ | ❌ | ❌ |
| Forgetting & Decay | ✅ | ❌ | ❌ | ❌ |
| Memory Compression | ✅ | ❌ | ❌ | ❌ |
| Conflict Detection & TMS | ✅ | ❌ | ❌ | ❌ |
| CLI Tools | ✅ | ❌ | ❌ | ❌ |
| Zero Config | ✅ | ✅ | ❌ | ✅ |
| MCP Server (Claude/Cursor) | ✅ | ❌ | ❌ | ❌ |
Quick Start
Installation
# Clone
git clone https://github.com/mohitkumarrajbadi/omem
cd omem
# Install
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e .
# Verify
omem health
macOS / Anaconda users — add to
~/.zshrconce:export KMP_DUPLICATE_LIB_OK=TRUE export HF_HUB_OFFLINE=1
60-Second Example
from omem import OMem
brain = OMem()
# Add memories — type and importance are detected automatically
brain.add("User prefers dark mode and Python for all backend work")
brain.add("Critical bug: race condition in payment module causes duplicate charges", importance=0.95)
brain.add("Architecture decision: migrated from REST to GraphQL for better performance")
# Retrieve what's relevant — not everything
results = brain.recall("What bugs do we have?")
print(results[0].content)
# → "Critical bug: race condition in payment module..."
# Understand exactly why this memory was returned
for exp in brain.inspect("payment bugs"):
print(exp.explain())
# → vector=0.91, keyword=0.85, recency=0.94, importance=1.5x boost
The Sleep Cycle — Let Your Agent Dream
# After hours of operation, consolidate redundant memories
brain.add("User clicked login button")
brain.add("User pressed sign-in")
brain.add("User tapped the login link")
result = brain.sleep()
# → compressed: 3 → 1 ("User repeatedly accessed login (3 instances)")
# → forgotten: 12 low-value memories removed
# → reflected: 4 new insights generated
How It Works
┌─────────────────────────────────────────────────────────┐
│ Your Agent / Claude / Cursor │
└──────────────────────────┬──────────────────────────────┘
│ MCP or Python SDK
▼
┌─────────────────────────────────────────────────────────┐
│ OMem Unified API │
│ add · recall · sleep · inspect · serve │
└────────────┬───────────────────────────┬────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌────────────────────────────┐
│ Rust Core │ │ Brain Logic │
│ │ │ │
│ • SIMD scoring │ │ • Auto-classification │
│ • FAISS HNSW │ │ • Importance estimation │
│ • Hybrid ranking │ │ • Forgetting & decay │
│ • Write buffer │ │ • Reflection & compress │
│ • RW lock │ │ • Conflict TMS │
└─────────────────────┘ └────────────────────────────┘
│ │
└─────────────┬─────────────┘
▼
┌──────────────────────────┐
│ SQLite · PostgreSQL │
│ FAISS · Knowledge Graph │
└──────────────────────────┘
The Retrieval Pipeline
Every recall() call combines 4 signals in a single SIMD pass:
Final Score = (0.50 × vector_similarity)
+ (0.20 × keyword_overlap)
+ (0.15 × recency_decay)
+ (0.15 × importance_weight)
× status_multiplier
Then optionally expanded via Graph-RAG: top results are linked to related entities in the knowledge graph, surfacing connected memories that pure vector search would miss.
Real-World Usage
Customer Support Agent
from omem import OMem
memory = OMem(namespace="support")
# Store rich customer context
memory.add("Customer John (john@acme.com) reported dashboard timeout on mobile Safari")
memory.add("Acme Corp is on Enterprise plan, SOC2 required by Q3")
# Later — retrieve with filters
context = memory.recall(
"mobile issues Acme",
context_type="bugs", # boost bug-type memories
time_range="recent", # prioritize last 3 days
k=5
)
Multi-Agent System
# Each agent is fully isolated
researcher = OMem(namespace="researcher")
writer = OMem(namespace="writer")
researcher.add("Study shows 40% retention improvement with personalized onboarding")
# No cross-namespace leakage
writer.recall("retention") # → []
# Global search when needed
researcher.recall("retention", project_only=False) # → finds it
Conflict Detection
brain.add("Python version: 3.9")
brain.add("Python version: 3.11") # → auto-flagged as CONFLICTED
brain.resolve_conflict("Python version")
# → resolves in favor of most recent, deprecates the old one
Integrations
Claude Desktop & Cursor (MCP Server) ⭐
omem serve # starts the MCP stdio server
Add to claude_desktop_config.json:
{
"mcpServers": {
"omem": {
"command": "omem",
"args": ["serve"]
}
}
}
What your AI gets:
| Tool | What it does |
|---|---|
remember |
Store a fact, decision, or preference |
recall |
Semantic search with type and time filters |
reflect |
Generate high-level insights from memory |
maintain |
Compress, forget, and optimize memory |
resolve_conflict |
Detect and fix contradictions |
summarize_state |
Get a project architecture overview |
Addressing a common concern:
"Won't injecting memory into every prompt bloat my context?"
No. OMem is a retrieval layer, not an injection layer. From 5,000 memories, it returns 3–5 targeted results (~200–500 tokens). That's 97% less context than a naive approach — while giving the agent exactly what it needs.
LangChain
from omem.integrations.langchain import OMemRetriever
retriever = OMemRetriever(omem_instance=brain)
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
CLI Reference
# Setup
omem init # initialize at ~/.omem/brain.db
omem health # system health check
# Write
omem add "content" -i 0.9 -n myproject -t DECISION
# Read
omem search "query" -k 10 -c architecture -t recent
omem list -n myproject -t DECISION -l 50
omem inspect "query" # debug retrieval scoring
omem stats && omem namespaces
# Maintenance
omem maintain --all # compress + reflect + forget + dream
# Import / Export
omem export -f json -o dump.json
omem load dump.json -n myproject
# Integrations
omem serve # MCP server for Claude / Cursor
omem dashboard --port 7900 # web memory dashboard
omem demo # end-to-end interactive walkthrough
omem benchmark --n 10000 # performance test
Architecture Details
Memory Types
OMem auto-classifies every memory on ingestion:
| Type | Examples |
|---|---|
SEMANTIC |
Facts, general knowledge |
DECISION |
Choices made, preferences |
CAUSAL |
Bug root causes, cause-effect chains |
PROCEDURAL |
How-to steps, workflows |
EPISODIC |
Events, experiences |
REFLECTION |
AI-generated insights |
ACTIVE |
Critical / urgent items |
WORKING |
Temporary, current-task context |
Scoring Signals
vector_similarity — semantic closeness to query (FAISS HNSW)
keyword_overlap — token-level BM25-style matching
recency_decay — exponential half-life decay over time
importance_weight — auto-scored + access-frequency boosted
status_multiplier — CONFLICTED memories penalized, DEPRECATED skipped
Storage
| Backend | Use Case |
|---|---|
| SQLite (default) | Local, single-process, zero config |
| In-memory | Testing, ephemeral agents |
| PostgreSQL | Production, multi-process, distributed |
Configuration
brain = OMem(
backend="sqlite", # "sqlite" | "memory" | "postgres"
db_path="~/.omem/brain.db", # custom path
model="all-MiniLM-L6-v2", # embedding model
embedding_provider="local", # "local" | "openai"
)
Environment variables:
HF_HUB_OFFLINE=1 # disable HuggingFace Hub network checks (faster startup)
KMP_DUPLICATE_LIB_OK=TRUE # fix OpenMP conflict on macOS/Anaconda
TOKENIZERS_PARALLELISM=false # suppress tokenizer warning
Roadmap
| Status | Feature |
|---|---|
| ✅ Released | Hybrid RAG, Auto-classification, Forgetting, Compression, MCP Server |
| ✅ Released | Truth Maintenance System, Knowledge Graph, Graph-RAG |
| ✅ Released | PostgreSQL backend, CLI, Dashboard |
| 🔄 In Progress | LOCOMO benchmark validation, distributed mode |
| 📅 Planned | Custom embedding providers (OpenAI, Cohere), Memory versioning |
FAQ
Q: Does this run an LLM internally?
A: No. Classification and importance scoring use lightweight heuristics and a small (~90MB) embedding model. No LLM API calls, no external dependencies, no costs.
Q: How is this different from ChromaDB or Pinecone?
A: Those are vector storage systems. OMem is a memory operating system — with lifecycle (importance → decay → forget), deduplication, conflict detection, knowledge graphs, and a cognitive maintenance cycle.
Q: Will it bloat my agent's context window?
A: The opposite. OMem retrieves 3–5 relevant memories per query (~300 tokens) instead of injecting your entire history. See the Context FAQ.
Q: Is it production-ready?
A: v1.0.0 is stable for production workloads. The SQLite backend handles hundreds of thousands of memories. PostgreSQL backend available for multi-process deployments.
Q: What about privacy?
A: Everything runs 100% locally by default. Your memories never leave your machine. PostgreSQL backend is self-hosted.
Q: Do I need Rust installed?
A: Only if you want the SIMD-accelerated scoring path. The pure-Python path works out of the box and is still competitive.
Contributing
git clone https://github.com/mohitkumarrajbadi/omem
cd omem
python -m venv .venv && source .venv/bin/activate
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e ".[dev]"
pytest tests/ -v
python benchmarks/competitor.py # run head-to-head benchmarks
See DEVELOPER.md for architecture, CLI reference, and contribution guidelines.
License
MIT — see LICENSE
Built for the AI developer community
If OMem makes your agents smarter, give it a ⭐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omem_os-1.0.0.tar.gz.
File metadata
- Download URL: omem_os-1.0.0.tar.gz
- Upload date:
- Size: 107.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
627425aaa51fe5c9db040aba8a600de68e984906d7efdf541344e70193497f5d
|
|
| MD5 |
331e060752a045b5e59678079b1be168
|
|
| BLAKE2b-256 |
c3ef35ccf4ff29835a906c8efe184a9d0c47f75aef58a6085e2af5eefadddcad
|
File details
Details for the file omem_os-1.0.0-py3-none-any.whl.
File metadata
- Download URL: omem_os-1.0.0-py3-none-any.whl
- Upload date:
- Size: 106.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b774275a1c934f8e391b0c0e409bd01acf91f1ae18d3428b05184306037747a7
|
|
| MD5 |
3ee5ac21a12e9cfa318db43f9bd9722b
|
|
| BLAKE2b-256 |
86e9a955fb8faec603b737fd7d1fc04e4a05b80b9f3c4007568370dd103f3974
|