Skip to main content

AI Memory Operating System — Graph-RAG, temporal truth maintenance, actionable schemas, selective encryption, sub-200ms hybrid retrieval.

Project description

Version Python Rust License MCP



OMem

The Memory Operating System for AI Agents

Persistent · Intelligent · Blazing Fast

Give your AI the memory it deserves — one that learns, forgets, and thinks.


Quick Start · Benchmarks · MCP / Claude Desktop · CLI · Docs


The Problem with AI Memory Today

Your agent is brilliant in the moment — but the second the conversation ends, it's gone. You've tried:

  • 🗃 Vector databases — Dumb storage. No lifecycle. No importance. Returns noise.
  • 📜 Long context windows — Expensive. Slow. Hits limits. Drowns your agent in irrelevant history.
  • 💾 Conversation buffers — Grows forever. Can't handle multi-session continuity.

None of these are memory systems. They're storage systems.


OMem is Different

OMem is a Memory Operating System — a complete cognitive layer that mirrors how intelligent systems actually remember:

Store everything  →  Classify what matters  →  Retrieve what's relevant
Compress noise    →  Forget the useless      →  Resolve contradictions

It's not a database with a retrieval wrapper. It's a brain.


Benchmarks

Tested on Apple M-series. Dataset: 5,000 memories, 500 queries, all-MiniLM-L6-v2 embedding model — shared identically across all systems for a fair comparison.

⚡ Head-to-Head Performance

System Setup Add (ops/s) RAG (ops/s) RAG p99
OMem 4.0 ms 65 † 292 20 ms
ChromaDB 507 ms 277 ‡ 280 4 ms
LanceDB 8 ms 82,000 ‡ 182 7 ms
Mem0 15,000+ ms < 1 18 638 ms

† Smart Ingestion — OMem's add() performs: embed → auto-classify → dedup-check → entity-graph sync → async persist. ChromaDB and LanceDB store pre-computed vectors only. We do the heavy lifting so your agent doesn't have to.

‡ Raw storage — No classification, no deduplication, no graph linking.

🏆 Why OMem Wins Where It Counts

Metric OMem vs Mem0 OMem vs ChromaDB OMem vs LanceDB
RAG throughput 16× faster 1.0× (parity) 1.6× faster
p50 recall 0.007 ms 3.5 ms 5.3 ms
Setup time 125× faster 127× faster parity
Smart features ✅ All 9 ❌ 0/9 ❌ 0/9

The critical insight: Mem0 is 16× slower because it runs an LLM extraction pipeline on every add. OMem replaces that with a Rust-native classification engine — zero LLM calls, zero API costs, zero network latency.

🧩 Feature Matrix

Feature OMem ChromaDB Mem0 LanceDB
Auto-Classification
Causal Graphs
Hybrid RAG (vector + keyword + recency + importance)
Forgetting & Decay
Memory Compression
Conflict Detection & TMS
CLI Tools
Zero Config
MCP Server (Claude/Cursor)

Quick Start

Installation

# Clone
git clone https://github.com/mohitkumarrajbadi/omem
cd omem

# Install
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e .

# Verify
omem health

macOS / Anaconda users — add to ~/.zshrc once:

export KMP_DUPLICATE_LIB_OK=TRUE
export HF_HUB_OFFLINE=1

60-Second Example

from omem import OMem

brain = OMem()

# Add memories — type and importance are detected automatically
brain.add("User prefers dark mode and Python for all backend work")
brain.add("Critical bug: race condition in payment module causes duplicate charges", importance=0.95)
brain.add("Architecture decision: migrated from REST to GraphQL for better performance")

# Retrieve what's relevant — not everything
results = brain.recall("What bugs do we have?")
print(results[0].content)
# → "Critical bug: race condition in payment module..."

# Understand exactly why this memory was returned
for exp in brain.inspect("payment bugs"):
    print(exp.explain())
# → vector=0.91, keyword=0.85, recency=0.94, importance=1.5x boost

The Sleep Cycle — Let Your Agent Dream

# After hours of operation, consolidate redundant memories
brain.add("User clicked login button")
brain.add("User pressed sign-in")
brain.add("User tapped the login link")

result = brain.sleep()
# → compressed: 3 → 1 ("User repeatedly accessed login (3 instances)")
# → forgotten: 12 low-value memories removed
# → reflected: 4 new insights generated

How It Works

┌─────────────────────────────────────────────────────────┐
│            Your Agent  /  Claude  /  Cursor              │
└──────────────────────────┬──────────────────────────────┘
                           │  MCP or Python SDK
                           ▼
┌─────────────────────────────────────────────────────────┐
│                    OMem Unified API                      │
│        add · recall · sleep · inspect · serve           │
└────────────┬───────────────────────────┬────────────────┘
             │                           │
             ▼                           ▼
┌─────────────────────┐     ┌────────────────────────────┐
│     Rust Core       │     │        Brain Logic          │
│                     │     │                            │
│  • SIMD scoring     │     │  • Auto-classification     │
│  • FAISS HNSW       │     │  • Importance estimation   │
│  • Hybrid ranking   │     │  • Forgetting & decay      │
│  • Write buffer     │     │  • Reflection & compress   │
│  • RW lock          │     │  • Conflict TMS            │
└─────────────────────┘     └────────────────────────────┘
             │                           │
             └─────────────┬─────────────┘
                           ▼
             ┌──────────────────────────┐
             │  SQLite · PostgreSQL     │
             │  FAISS · Knowledge Graph │
             └──────────────────────────┘

The Retrieval Pipeline

Every recall() call combines 4 signals in a single SIMD pass:

Final Score = (0.50 × vector_similarity)
            + (0.20 × keyword_overlap)
            + (0.15 × recency_decay)
            + (0.15 × importance_weight)
            × status_multiplier

Then optionally expanded via Graph-RAG: top results are linked to related entities in the knowledge graph, surfacing connected memories that pure vector search would miss.


Real-World Usage

Customer Support Agent

from omem import OMem

memory = OMem(namespace="support")

# Store rich customer context
memory.add("Customer John (john@acme.com) reported dashboard timeout on mobile Safari")
memory.add("Acme Corp is on Enterprise plan, SOC2 required by Q3")

# Later — retrieve with filters
context = memory.recall(
    "mobile issues Acme",
    context_type="bugs",    # boost bug-type memories
    time_range="recent",    # prioritize last 3 days
    k=5
)

Multi-Agent System

# Each agent is fully isolated
researcher = OMem(namespace="researcher")
writer     = OMem(namespace="writer")

researcher.add("Study shows 40% retention improvement with personalized onboarding")

# No cross-namespace leakage
writer.recall("retention")       # → []

# Global search when needed
researcher.recall("retention", project_only=False)  # → finds it

Conflict Detection

brain.add("Python version: 3.9")
brain.add("Python version: 3.11")  # → auto-flagged as CONFLICTED

brain.resolve_conflict("Python version")
# → resolves in favor of most recent, deprecates the old one

Integrations

Claude Desktop & Cursor (MCP Server) ⭐

omem serve   # starts the MCP stdio server

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "omem": {
      "command": "omem",
      "args": ["serve"]
    }
  }
}

What your AI gets:

Tool What it does
remember Store a fact, decision, or preference
recall Semantic search with type and time filters
reflect Generate high-level insights from memory
maintain Compress, forget, and optimize memory
resolve_conflict Detect and fix contradictions
summarize_state Get a project architecture overview

Addressing a common concern:

"Won't injecting memory into every prompt bloat my context?"

No. OMem is a retrieval layer, not an injection layer. From 5,000 memories, it returns 3–5 targeted results (~200–500 tokens). That's 97% less context than a naive approach — while giving the agent exactly what it needs.

LangChain

from omem.integrations.langchain import OMemRetriever

retriever = OMemRetriever(omem_instance=brain)
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

CLI Reference

# Setup
omem init                         # initialize at ~/.omem/brain.db
omem health                       # system health check

# Write
omem add "content" -i 0.9 -n myproject -t DECISION

# Read
omem search "query" -k 10 -c architecture -t recent
omem list -n myproject -t DECISION -l 50
omem inspect "query"              # debug retrieval scoring
omem stats && omem namespaces

# Maintenance
omem maintain --all               # compress + reflect + forget + dream

# Import / Export
omem export -f json -o dump.json
omem load dump.json -n myproject

# Integrations
omem serve                        # MCP server for Claude / Cursor
omem dashboard --port 7900        # web memory dashboard
omem demo                         # end-to-end interactive walkthrough
omem benchmark --n 10000          # performance test

Architecture Details

Memory Types

OMem auto-classifies every memory on ingestion:

Type Examples
SEMANTIC Facts, general knowledge
DECISION Choices made, preferences
CAUSAL Bug root causes, cause-effect chains
PROCEDURAL How-to steps, workflows
EPISODIC Events, experiences
REFLECTION AI-generated insights
ACTIVE Critical / urgent items
WORKING Temporary, current-task context

Scoring Signals

vector_similarity   — semantic closeness to query (FAISS HNSW)
keyword_overlap     — token-level BM25-style matching
recency_decay       — exponential half-life decay over time
importance_weight   — auto-scored + access-frequency boosted
status_multiplier   — CONFLICTED memories penalized, DEPRECATED skipped

Storage

Backend Use Case
SQLite (default) Local, single-process, zero config
In-memory Testing, ephemeral agents
PostgreSQL Production, multi-process, distributed

Configuration

brain = OMem(
    backend="sqlite",              # "sqlite" | "memory" | "postgres"
    db_path="~/.omem/brain.db",   # custom path
    model="all-MiniLM-L6-v2",     # embedding model
    embedding_provider="local",    # "local" | "openai"
)

Environment variables:

HF_HUB_OFFLINE=1              # disable HuggingFace Hub network checks (faster startup)
KMP_DUPLICATE_LIB_OK=TRUE     # fix OpenMP conflict on macOS/Anaconda
TOKENIZERS_PARALLELISM=false  # suppress tokenizer warning

Roadmap

Status Feature
✅ Released Hybrid RAG, Auto-classification, Forgetting, Compression, MCP Server
✅ Released Truth Maintenance System, Knowledge Graph, Graph-RAG
✅ Released PostgreSQL backend, CLI, Dashboard
🔄 In Progress LOCOMO benchmark validation, distributed mode
📅 Planned Custom embedding providers (OpenAI, Cohere), Memory versioning

FAQ

Q: Does this run an LLM internally?
A: No. Classification and importance scoring use lightweight heuristics and a small (~90MB) embedding model. No LLM API calls, no external dependencies, no costs.

Q: How is this different from ChromaDB or Pinecone?
A: Those are vector storage systems. OMem is a memory operating system — with lifecycle (importance → decay → forget), deduplication, conflict detection, knowledge graphs, and a cognitive maintenance cycle.

Q: Will it bloat my agent's context window?
A: The opposite. OMem retrieves 3–5 relevant memories per query (~300 tokens) instead of injecting your entire history. See the Context FAQ.

Q: Is it production-ready?
A: v1.0.0 is stable for production workloads. The SQLite backend handles hundreds of thousands of memories. PostgreSQL backend available for multi-process deployments.

Q: What about privacy?
A: Everything runs 100% locally by default. Your memories never leave your machine. PostgreSQL backend is self-hosted.

Q: Do I need Rust installed?
A: Only if you want the SIMD-accelerated scoring path. The pure-Python path works out of the box and is still competitive.


Contributing

git clone https://github.com/mohitkumarrajbadi/omem
cd omem
python -m venv .venv && source .venv/bin/activate
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e ".[dev]"
pytest tests/ -v
python benchmarks/competitor.py   # run head-to-head benchmarks

See DEVELOPER.md for architecture, CLI reference, and contribution guidelines.


License

MIT — see LICENSE


Built for the AI developer community

If OMem makes your agents smarter, give it a ⭐

Report Bug · Request Feature · Discussions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omem_os-1.0.0.tar.gz (107.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omem_os-1.0.0-py3-none-any.whl (106.3 kB view details)

Uploaded Python 3

File details

Details for the file omem_os-1.0.0.tar.gz.

File metadata

  • Download URL: omem_os-1.0.0.tar.gz
  • Upload date:
  • Size: 107.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for omem_os-1.0.0.tar.gz
Algorithm Hash digest
SHA256 627425aaa51fe5c9db040aba8a600de68e984906d7efdf541344e70193497f5d
MD5 331e060752a045b5e59678079b1be168
BLAKE2b-256 c3ef35ccf4ff29835a906c8efe184a9d0c47f75aef58a6085e2af5eefadddcad

See more details on using hashes here.

File details

Details for the file omem_os-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: omem_os-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 106.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for omem_os-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b774275a1c934f8e391b0c0e409bd01acf91f1ae18d3428b05184306037747a7
MD5 3ee5ac21a12e9cfa318db43f9bd9722b
BLAKE2b-256 86e9a955fb8faec603b737fd7d1fc04e4a05b80b9f3c4007568370dd103f3974

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page