AI Memory Operating System — Graph-RAG, temporal truth maintenance, actionable schemas, selective encryption, sub-200ms hybrid retrieval.

These details have not been verified by PyPI

Project links

Project description

OMem

The Memory Operating System for AI Agents

Persistent · Intelligent · Blazing Fast

Give your AI the memory it deserves — one that learns, forgets, and thinks.

Quick Start · Benchmarks · MCP / Claude Desktop · CLI · Docs

The Problem with AI Memory Today

Your agent is brilliant in the moment — but the second the conversation ends, it's gone. You've tried:

🗃 Vector databases — Dumb storage. No lifecycle. No importance. Returns noise.
📜 Long context windows — Expensive. Slow. Hits limits. Drowns your agent in irrelevant history.
💾 Conversation buffers — Grows forever. Can't handle multi-session continuity.

None of these are memory systems. They're storage systems.

OMem is Different

OMem is a Memory Operating System — a complete cognitive layer that mirrors how intelligent systems actually remember:

Store everything  →  Classify what matters  →  Retrieve what's relevant
Compress noise    →  Forget the useless      →  Resolve contradictions

It's not a database with a retrieval wrapper. It's a brain.

Benchmarks

Tested on Apple M-series. Dataset: 5,000 memories, 500 queries, all-MiniLM-L6-v2 embedding model — shared identically across all systems for a fair comparison.

⚡ Head-to-Head Performance

System	Setup	Add (ops/s)	RAG (ops/s)	RAG p99
OMem	4.0 ms	65 †	292	20 ms
ChromaDB	507 ms	277 ‡	280	4 ms
LanceDB	8 ms	82,000 ‡	182	7 ms
Mem0	15,000+ ms	< 1	18	638 ms

† Smart Ingestion — OMem's add() performs: embed → auto-classify → dedup-check → entity-graph sync → async persist. ChromaDB and LanceDB store pre-computed vectors only. We do the heavy lifting so your agent doesn't have to.

‡ Raw storage — No classification, no deduplication, no graph linking.

🏆 Why OMem Wins Where It Counts

Metric	OMem vs Mem0	OMem vs ChromaDB	OMem vs LanceDB
RAG throughput	16× faster	1.0× (parity)	1.6× faster
p50 recall	0.007 ms	3.5 ms	5.3 ms
Setup time	125× faster	127× faster	parity
Smart features	✅ All 9	❌ 0/9	❌ 0/9

The critical insight: Mem0 is 16× slower because it runs an LLM extraction pipeline on every add. OMem replaces that with a Rust-native classification engine — zero LLM calls, zero API costs, zero network latency.

🧩 Feature Matrix

Feature	OMem	ChromaDB	Mem0	LanceDB
Auto-Classification	✅	❌	❌	❌
Causal Graphs	✅	❌	❌	❌
Hybrid RAG (vector + keyword + recency + importance)	✅	❌	❌	❌
Forgetting & Decay	✅	❌	❌	❌
Memory Compression	✅	❌	❌	❌
Conflict Detection & TMS	✅	❌	❌	❌
CLI Tools	✅	❌	❌	❌
Zero Config	✅	✅	❌	✅
MCP Server (Claude/Cursor)	✅	❌	❌	❌

Quick Start

Installation

# Clone
git clone https://github.com/mohitkumarrajbadi/omem
cd omem

# Install
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e .

# Verify
omem health

macOS / Anaconda users — add to ~/.zshrc once:
export KMP_DUPLICATE_LIB_OK=TRUE
export HF_HUB_OFFLINE=1

60-Second Example

from omem import OMem

brain = OMem()

# Add memories — type and importance are detected automatically
brain.add("User prefers dark mode and Python for all backend work")
brain.add("Critical bug: race condition in payment module causes duplicate charges", importance=0.95)
brain.add("Architecture decision: migrated from REST to GraphQL for better performance")

# Retrieve what's relevant — not everything
results = brain.recall("What bugs do we have?")
print(results[0].content)
# → "Critical bug: race condition in payment module..."

# Understand exactly why this memory was returned
for exp in brain.inspect("payment bugs"):
    print(exp.explain())
# → vector=0.91, keyword=0.85, recency=0.94, importance=1.5x boost

The Sleep Cycle — Let Your Agent Dream

# After hours of operation, consolidate redundant memories
brain.add("User clicked login button")
brain.add("User pressed sign-in")
brain.add("User tapped the login link")

result = brain.sleep()
# → compressed: 3 → 1 ("User repeatedly accessed login (3 instances)")
# → forgotten: 12 low-value memories removed
# → reflected: 4 new insights generated

How It Works

┌─────────────────────────────────────────────────────────┐
│            Your Agent  /  Claude  /  Cursor              │
└──────────────────────────┬──────────────────────────────┘
                           │  MCP or Python SDK
                           ▼
┌─────────────────────────────────────────────────────────┐
│                    OMem Unified API                      │
│        add · recall · sleep · inspect · serve           │
└────────────┬───────────────────────────┬────────────────┘
             │                           │
             ▼                           ▼
┌─────────────────────┐     ┌────────────────────────────┐
│     Rust Core       │     │        Brain Logic          │
│                     │     │                            │
│  • SIMD scoring     │     │  • Auto-classification     │
│  • FAISS HNSW       │     │  • Importance estimation   │
│  • Hybrid ranking   │     │  • Forgetting & decay      │
│  • Write buffer     │     │  • Reflection & compress   │
│  • RW lock          │     │  • Conflict TMS            │
└─────────────────────┘     └────────────────────────────┘
             │                           │
             └─────────────┬─────────────┘
                           ▼
             ┌──────────────────────────┐
             │  SQLite · PostgreSQL     │
             │  FAISS · Knowledge Graph │
             └──────────────────────────┘

The Retrieval Pipeline

Every recall() call combines 4 signals in a single SIMD pass:

Final Score = (0.50 × vector_similarity)
            + (0.20 × keyword_overlap)
            + (0.15 × recency_decay)
            + (0.15 × importance_weight)
            × status_multiplier

Then optionally expanded via Graph-RAG: top results are linked to related entities in the knowledge graph, surfacing connected memories that pure vector search would miss.

Real-World Usage

Customer Support Agent

from omem import OMem

memory = OMem(namespace="support")

# Store rich customer context
memory.add("Customer John (john@acme.com) reported dashboard timeout on mobile Safari")
memory.add("Acme Corp is on Enterprise plan, SOC2 required by Q3")

# Later — retrieve with filters
context = memory.recall(
    "mobile issues Acme",
    context_type="bugs",    # boost bug-type memories
    time_range="recent",    # prioritize last 3 days
    k=5
)

Multi-Agent System

# Each agent is fully isolated
researcher = OMem(namespace="researcher")
writer     = OMem(namespace="writer")

researcher.add("Study shows 40% retention improvement with personalized onboarding")

# No cross-namespace leakage
writer.recall("retention")       # → []

# Global search when needed
researcher.recall("retention", project_only=False)  # → finds it

Conflict Detection

brain.add("Python version: 3.9")
brain.add("Python version: 3.11")  # → auto-flagged as CONFLICTED

brain.resolve_conflict("Python version")
# → resolves in favor of most recent, deprecates the old one

Integrations

Claude Desktop & Cursor (MCP Server) ⭐

omem serve   # starts the MCP stdio server

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "omem": {
      "command": "omem",
      "args": ["serve"]
    }
  }
}

What your AI gets:

Tool	What it does
`remember`	Store a fact, decision, or preference
`recall`	Semantic search with type and time filters
`reflect`	Generate high-level insights from memory
`maintain`	Compress, forget, and optimize memory
`resolve_conflict`	Detect and fix contradictions
`summarize_state`	Get a project architecture overview

Addressing a common concern:

"Won't injecting memory into every prompt bloat my context?"

No. OMem is a retrieval layer, not an injection layer. From 5,000 memories, it returns 3–5 targeted results (~200–500 tokens). That's 97% less context than a naive approach — while giving the agent exactly what it needs.

LangChain

from omem.integrations.langchain import OMemRetriever

retriever = OMemRetriever(omem_instance=brain)
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

CLI Reference

# Setup
omem init                         # initialize at ~/.omem/brain.db
omem health                       # system health check

# Write
omem add "content" -i 0.9 -n myproject -t DECISION

# Read
omem search "query" -k 10 -c architecture -t recent
omem list -n myproject -t DECISION -l 50
omem inspect "query"              # debug retrieval scoring
omem stats && omem namespaces

# Maintenance
omem maintain --all               # compress + reflect + forget + dream

# Import / Export
omem export -f json -o dump.json
omem load dump.json -n myproject

# Integrations
omem serve                        # MCP server for Claude / Cursor
omem dashboard --port 7900        # web memory dashboard
omem demo                         # end-to-end interactive walkthrough
omem benchmark --n 10000          # performance test

Architecture Details

Memory Types

OMem auto-classifies every memory on ingestion:

Type	Examples
`SEMANTIC`	Facts, general knowledge
`DECISION`	Choices made, preferences
`CAUSAL`	Bug root causes, cause-effect chains
`PROCEDURAL`	How-to steps, workflows
`EPISODIC`	Events, experiences
`REFLECTION`	AI-generated insights
`ACTIVE`	Critical / urgent items
`WORKING`	Temporary, current-task context

Scoring Signals

vector_similarity   — semantic closeness to query (FAISS HNSW)
keyword_overlap     — token-level BM25-style matching
recency_decay       — exponential half-life decay over time
importance_weight   — auto-scored + access-frequency boosted
status_multiplier   — CONFLICTED memories penalized, DEPRECATED skipped

Storage

Backend	Use Case
SQLite (default)	Local, single-process, zero config
In-memory	Testing, ephemeral agents
PostgreSQL	Production, multi-process, distributed

Configuration

brain = OMem(
    backend="sqlite",              # "sqlite" | "memory" | "postgres"
    db_path="~/.omem/brain.db",   # custom path
    model="all-MiniLM-L6-v2",     # embedding model
    embedding_provider="local",    # "local" | "openai"
)

Environment variables:

HF_HUB_OFFLINE=1              # disable HuggingFace Hub network checks (faster startup)
KMP_DUPLICATE_LIB_OK=TRUE     # fix OpenMP conflict on macOS/Anaconda
TOKENIZERS_PARALLELISM=false  # suppress tokenizer warning

Roadmap

Status	Feature
✅ Released	Hybrid RAG, Auto-classification, Forgetting, Compression, MCP Server
✅ Released	Truth Maintenance System, Knowledge Graph, Graph-RAG
✅ Released	PostgreSQL backend, CLI, Dashboard
🔄 In Progress	LOCOMO benchmark validation, distributed mode
📅 Planned	Custom embedding providers (OpenAI, Cohere), Memory versioning

FAQ

Q: Does this run an LLM internally?
A: No. Classification and importance scoring use lightweight heuristics and a small (~90MB) embedding model. No LLM API calls, no external dependencies, no costs.

Q: How is this different from ChromaDB or Pinecone?
A: Those are vector storage systems. OMem is a memory operating system — with lifecycle (importance → decay → forget), deduplication, conflict detection, knowledge graphs, and a cognitive maintenance cycle.

Q: Will it bloat my agent's context window?
A: The opposite. OMem retrieves 3–5 relevant memories per query (~300 tokens) instead of injecting your entire history. See the Context FAQ.

Q: Is it production-ready?
A: v1.0.0 is stable for production workloads. The SQLite backend handles hundreds of thousands of memories. PostgreSQL backend available for multi-process deployments.

Q: What about privacy?
A: Everything runs 100% locally by default. Your memories never leave your machine. PostgreSQL backend is self-hosted.

Q: Do I need Rust installed?
A: Only if you want the SIMD-accelerated scoring path. The pure-Python path works out of the box and is still competitive.

Contributing

git clone https://github.com/mohitkumarrajbadi/omem
cd omem
python -m venv .venv && source .venv/bin/activate
SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e ".[dev]"
pytest tests/ -v
python benchmarks/competitor.py   # run head-to-head benchmarks

See DEVELOPER.md for architecture, CLI reference, and contribution guidelines.

License

MIT — see LICENSE

Built for the AI developer community

If OMem makes your agents smarter, give it a ⭐

Report Bug · Request Feature · Discussions

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omem_os-1.0.0.tar.gz (107.7 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omem_os-1.0.0-py3-none-any.whl (106.3 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file omem_os-1.0.0.tar.gz.

File metadata

Download URL: omem_os-1.0.0.tar.gz
Upload date: Apr 25, 2026
Size: 107.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for omem_os-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`627425aaa51fe5c9db040aba8a600de68e984906d7efdf541344e70193497f5d`
MD5	`331e060752a045b5e59678079b1be168`
BLAKE2b-256	`c3ef35ccf4ff29835a906c8efe184a9d0c47f75aef58a6085e2af5eefadddcad`

See more details on using hashes here.

File details

Details for the file omem_os-1.0.0-py3-none-any.whl.

File metadata

Download URL: omem_os-1.0.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 106.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for omem_os-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b774275a1c934f8e391b0c0e409bd01acf91f1ae18d3428b05184306037747a7`
MD5	`3ee5ac21a12e9cfa318db43f9bd9722b`
BLAKE2b-256	`86e9a955fb8faec603b737fd7d1fc04e4a05b80b9f3c4007568370dd103f3974`

See more details on using hashes here.

omem-os 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OMem

The Memory Operating System for AI Agents

The Problem with AI Memory Today

OMem is Different

Benchmarks

⚡ Head-to-Head Performance

🏆 Why OMem Wins Where It Counts

🧩 Feature Matrix

Quick Start

Installation

60-Second Example

The Sleep Cycle — Let Your Agent Dream

How It Works

The Retrieval Pipeline

Real-World Usage

Customer Support Agent

Multi-Agent System

Conflict Detection

Integrations

Claude Desktop & Cursor (MCP Server) ⭐

LangChain

CLI Reference

Architecture Details

Memory Types

Scoring Signals

Storage

Configuration

Roadmap

FAQ

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes