Advanced LLM-native memory orchestration with dual-agent synthesis, conflict-aware evolution, multi-tenant vector DB backends, and async SDK/API tooling.

These details have not been verified by PyPI

Project links

Project description

OmniMemory · The Living Brain for Autonomous Agents

Don't just store data. Synthesize memories. OmniMemory transforms static embeddings into a self-evolving cognitive substrate.

Quick Start · CLI · SDK · Production · Environment Variables · Architecture · REST API

Why OmniMemory?

Traditional RAG is a filing cabinet: you put documents in, you take documents out. OmniMemory is a living brain.

It doesn't just "store" messages. It employs a Dual-Agent Synthesis engine to interpret conversations, extract behavioral patterns, and resolve contradictions automatically. When a new memory conflicts with an old one, OmniMemory doesn't just append—it updates, deletes, or consolidates the knowledge graph, just like human memory.

Feature	Traditional Vector RAG	OmniMemory (SECMSA)
Input Handling	Naive chunking & embedding	Dual-Agent Synthesis (Episodic + Summarizer)
Conflict Resolution	None (contradictions coexist)	Self-Evolving (Update/Delete/Skip operations)
Retrieval Logic	Cosine similarity only	Composite Scoring (Relevance × [1 + Recency + Importance])
Context Awareness	Static text chunks	Structured Memory Notes (Behavior, Learnings, Guidance)
Multi-Tenancy	Often manual filtering	Native Isolation (App / User / Session tiers)

Core Features

1. Dual-Agent Synthesis

Two specialized agents work in parallel to process every interaction:

Episodic Agent: Analyzes behavior. "User prefers concise answers," "User struggles with async concepts."
Summarizer Agent: Analyzes narrative. "Project X is delayed," "Deployed v2.0 to prod."

2. Self-Evolving Memory

Memories aren't static. The system automatically detects conflicts between new and existing information.

UPDATE: Merges fragmented details into a single, comprehensive note.
DELETE: Removes outdated or contradicted information.
SKIP: Ignores redundant inputs to keep the index clean.

3. Composite Scoring

We don't just return the "nearest neighbor." We return the most useful memory.

Score = Relevance * (1 + Recency_Boost + Importance_Boost)

This ensures high-relevance memories always win, but recent and critical memories get the nudge they need to surface.

4. Enterprise Multi-Tenancy

Built for SaaS from day one.

App Level: Physical isolation (separate collections).
User Level: Logical isolation (metadata filtering).
Session Level: Conversation grouping.

Supported Backends

Switch providers by changing OMNI_MEMORY_PROVIDER. No code changes required.

Provider	Env Value	Best For
Qdrant	`qdrant-remote`	Production default. High performance, rich filtering
ChromaDB	`chromadb-remote`	Simple deployments, local development
PostgreSQL	`postgresql`	Teams already using Postgres (via pgvector)
MongoDB	`mongodb`	Atlas users needing vector search + document store

When to Use: API vs SDK

Use the REST API Server (Recommended for Production)

Why: Language-agnostic. Works with any programming language (Node.js, Go, Rust, Java, PHP, etc.)

Best For:

✅ Production deployments
✅ Multi-language teams
✅ Microservices architectures
✅ Need built-in metrics, health checks, connection pooling

Use the Python SDK (Dev/Prototyping)

Why: Direct Python integration for rapid testing

Best For:

✅ Python-only agents
✅ Local development and testing
✅ Prototyping memory operations

Quick Start

TL;DR: Want to see it in action immediately?
# Run the complete Customer Support Agent example
python examples/complete_sdk_example.py

1. Install

uv add omnimemory
# or
pip install omnimemory

2. Configure

Create a .env file (templates in examples/env/):

# LLM & Embeddings
LLM_API_KEY=sk-...
LLM_PROVIDER=openai
EMBEDDING_API_KEY=sk-...
EMBEDDING_PROVIDER=openai

# Vector DB (Choose one: qdrant, chromadb, postgresql, mongodb)
OMNI_MEMORY_PROVIDER=qdrant-remote
QDRANT_HOST=localhost
QDRANT_PORT=6333

3. Run (Choose Your Backend)

Start the vector DB and API server:

Qdrant (Production Default - High Performance)

docker compose -f docker-compose.local.yml --profile qdrant up -d
uv run uvicorn omnimemory.api.server:app --host 0.0.0.0 --port 8001 --reload

ChromaDB (Simple Deployments)

docker compose -f docker-compose.local.yml --profile chromadb up -d
uv run uvicorn omnimemory.api.server:app --host 0.0.0.0 --port 8001 --reload

PostgreSQL (Existing Postgres Users)

docker compose -f docker-compose.local.yml --profile pgvector up -d
uv run uvicorn omnimemory.api.server:app --host 0.0.0.0 --port 8001 --reload

MongoDB (Configure MongoDB Atlas separately)

# Set MONGO_URI in .env first
uv run uvicorn omnimemory.api.server:app --host 0.0.0.0 --port 8001 --reload

4. Use (Python SDK)

from omnimemory.sdk import OmniMemorySDK
from omnimemory.core.schemas import UserMessages, Message
import asyncio

async def main():
    sdk = OmniMemorySDK()
    
    # CRITICAL: Initialize connection pools
    if not await sdk.warm_up():
        print("Failed to warm up SDK")
        return

    # Add a memory (returns a background task ID)
    response = await sdk.add_memory(UserMessages(
        app_id="my-app",
        user_id="user-123",
        messages=[
            Message(role="user", content="I'm building a Python web scraper."),
            Message(role="assistant", content="I can help with libraries like BeautifulSoup.")
        ] * 5 # Need sufficient context (default 10 messages)
    ))
    
    task_id = response["task_id"]
    print(f"Memory processing started. Task ID: {task_id}")
    
    # Fire-and-Forget: Memory processes in background
    # No need to poll - check logs if debugging needed
    await asyncio.sleep(3)  # Give it time to process

    # Query memory (semantic search)
    results = await sdk.query_memory(
        app_id="my-app",
        user_id="user-123",
        session_id="session-123", # Optional
        n_results=1, # Optional
        similarity_threshold=0.7, # Optional
        query="What is the user working on?"
    )
    
    print(results[0]['memory_note']) 
    # Output: "User is developing a Python-based web scraper..."

asyncio.run(main())

5. Quick Test Examples

Want to see it in action? We provide complete, real-world examples for both SDK and API usage.

Run the SDK Example:

# Demonstrates full customer support workflow with memory batching
python examples/complete_sdk_example.py

Run the API Example:

# Requires running server (uv run uvicorn omnimemory.api.server:app --port 8001)
python examples/complete_api_example.py

Production Features

Fully Asynchronous: O(1) latency from user perspective. Memory synthesis happens in fire-and-forget background tasks. No polling needed - check logs if debugging required.

Connection Pooling: Intelligent pool management with configurable size (default 10). Initializes with 50% of max connections for optimal startup performance, then scales on-demand to handle concurrent workloads.

Metrics & Observability: Prometheus-compatible metrics at http://localhost:9001/metrics (enable with OMNIMEMORY_ENABLE_METRICS_SERVER=true).

Multi-Tenancy: 3-tier isolation (app/user/session) for SaaS deployments. Complete data separation.

89.58% Test Coverage: Production-grade reliability with comprehensive test suite.

Language Agnostic: REST API works with any language. Python SDK provided for convenience.

Agent Memory SDK

For Agents That Need to Answer Questions Using Stored Memories

The AgentMemorySDK provides a complete "query memory + generate answer" loop. It retrieves relevant memories and calls your LLM with context to generate grounded responses.

from omnimemory import AgentMemorySDK

agent_sdk = AgentMemorySDK()

response = await agent_sdk.answer_query(
    app_id="my-app-id-1234",
    query="What does the user prefer?",
    user_id="user-123456", # Optional
    session_id="session-123456", # Optional
    n_results=5, # Optional
    similarity_threshold=0.7 # Optional
)

print(response["answer"])  # LLM-generated answer
print(f"Based on {len(response['memories'])} memories")

Use When: Your agent needs to answer user questions using stored memories.

Not For: Storing new memories (use OmniMemorySDK.add_memory or add_agent_memory for that).

CLI Tool

OmniMemory includes a powerful command-line interface for quick operations and testing:

# Install
uv add omnimemory

# Get help
omnimemory --help

# Start daemon for background operations
omnimemory daemon start

# Add memory
omnimemory memory add \
  --app-id "myapp-1234567890" \
  --user-id "user-1234567890" \
  --message "user:I prefer dark mode" \
  --message "assistant:Noted, I'll remember that"

# Query memory
omnimemory memory query \
  --app-id "myapp-1234567890" \
  --query "user preferences"

# Check system health
omnimemory health

# View comprehensive feature guide
omnimemory info

# Daemon management
omnimemory daemon status
omnimemory daemon stop

Available Commands:

omnimemory memory - Memory operations (add, query, get, delete)
omnimemory memory batch - Batch message operations
omnimemory daemon - Background daemon management
omnimemory agent - Agent-specific operations
omnimemory health - System health diagnostics
omnimemory info - Feature overview

For detailed CLI documentation, run omnimemory --help.

SDK Usage Guide

Note: This guide covers the Python SDK. For the HTTP REST API, see API_SPECIFICATION.md.

Initialization

CRITICAL: You must warm up the connection pools before making requests.

Why: Initializes vector DB connections for low latency on first request.

from omnimemory.sdk import OmniMemorySDK

sdk = OmniMemorySDK()
success = await sdk.warm_up()
if not success:
    print("Failed to initialize connections")

Core Memory Operations

1. Add Memory (`add_memory`)

Use Case: Primary engine for conversation analysis with Dual-Agent Synthesis.

Why: Needs a flow of conversation (default 10 messages) to understand context. The Episodic and Summarizer agents extract behavioral patterns and resolve conflicts. Single messages won't work.

Parameters:

user_message: UserMessages - Contains app_id, user_id, session_id (optional), messages (list of Message objects)
messages must have exactly OMNIMEMORY_DEFAULT_MAX_MESSAGES (default 10)

Returns: Task ID immediately (async processing)

from omnimemory.core.schemas import UserMessages, Message

response = await sdk.add_memory(UserMessages(
    app_id="my-app-id-1234",
    user_id="user-123456",
    session_id="session-789",  # Optional
    messages=[
        Message(role="user", content="I prefer dark mode"),
        Message(role="assistant", content="Noted, I'll remember that")
        # ... total 10 messages required
    ]
))
task_id = response["task_id"]
print(f"Processing in background: {task_id}")

2. Add Agent Memory (`add_agent_memory`)

Use Case: Agent Tool for quick saves.

Why: When your agent learns new info or user says "save this," the agent calls this directly. Accepts both structured and unstructured messages. Bypasses conflict resolution for speed.

Best Practice: Add to agent system prompt as a tool.

Parameters:

agent_request: AgentMemoryRequest - Contains app_id, user_id, session_id (optional), messages (string or list)

Returns: Task ID immediately

from omnimemory.core.schemas import AgentMemoryRequest

# Unstructured (string)
response = await sdk.add_agent_memory(AgentMemoryRequest(
    app_id="my-app-id-1234",
    user_id="user-123456",
    messages="User completed premium signup and selected annual plan"
))

# Structured (list)
response = await sdk.add_agent_memory(AgentMemoryRequest(
    app_id="my-app-id-1234",
    user_id="user-123456",
    messages=[
        {"role": "user", "content": "What's my email?"},
        {"role": "assistant", "content": "It's user@example.com"}
    ]
))

3. Query Memory (`query_memory`)

Use Case: Retrieve memories using semantic search and composite scoring.

How: Uses Relevance × (1 + Recency + Importance) scoring.

Parameters:

app_id: str (required)
query: str (required) - Natural language query
user_id: str (optional) - Filter by user
session_id: str (optional) - Filter by session
n_results: int (optional, default from env) - Max results to return
similarity_threshold: float (optional, default from env) - Min similarity (0.0-1.0). Overrides OMNIMEMORY_RECALL_THRESHOLD env var.

Returns: List of memory dictionaries

Query Best Practices:

💡 TIP: Specific queries yield better results than generic ones.

Query Type	Example	Expected Score	Quality
❌ Too Generic	"what is machine learning"	0.20-0.30	Poor - too broad
⚠️ Somewhat Generic	"neural networks"	0.30-0.40	Fair - lacks context
✅ Specific	"how to implement neural networks with backpropagation"	0.40-0.55	Good - targeted
✅ Very Specific	"troubleshooting slow loss decrease in neural network training"	0.50-0.65	Excellent - precise

Understanding Similarity Scores:

0.20-0.35: Weakly related content (consider lowering threshold if needed)
0.35-0.55: Semantically related (typical for good matches)
0.55-0.75: Strong match (rare, usually requires similar phrasing)
0.75-1.00: Near-identical content (very rare in practice)

Note: Composite scoring boosts relevant memories with recency/importance, so a 0.45 similarity can become 0.60+ composite score.

results = await sdk.query_memory(
    app_id="my-app-id-1234",
    query="What does the user like?",
    user_id="user-123456",          # Optional
    session_id="session-789",       # Optional
    n_results=10,                   # Optional (default 5)
    similarity_threshold=0.75       # Optional (overrides env default 0.3)
)

for memory in results:
    print(memory["document"])
    print(f"Score: {memory['composite_score']}")

4. Get Memory (`get_memory`)

Use Case: Retrieve a single memory by its ID.

Why: When you have a memory ID from a previous operation and need full content.

Parameters:

memory_id: str (required)
app_id: str (required)

Returns: Memory dict or None

memory = await sdk.get_memory(
    memory_id="uuid-1234-5678",
    app_id="my-app-id-1234"
)
if memory:
    print(memory["document"])

5. Delete Memory (`delete_memory`)

Use Case: Manual memory deletion (GDPR, cleanup).

Why: User requests deletion or you need to remove test data.

Parameters:

app_id: str (required)
doc_id: str (required) - Document ID to delete

Returns: Boolean (success/failure)

success = await sdk.delete_memory(
    app_id="my-app-id-1234",
    doc_id="uuid-1234-5678"
)
if success:
    print("Memory deleted")

Summarization

Summarize Conversation (`summarize_conversation`)

Use Case: Context Window Management.

Why: When working memory is full, generate a summary, save it, delete old messages to free tokens.

Accepts: Both structured and unstructured messages.

Two Modes:

Sync Mode (No `callback_url`)

Returns: Summary immediately
Processing: Fast (use_fast_path=True)
Use When: Real-time responses needed, short contexts

from omnimemory.core.schemas import ConversationSummaryRequest

summary = await sdk.summarize_conversation(ConversationSummaryRequest(
    app_id="my-app-id-1234",
    user_id="user-123456",
    messages=[...]  # Structured or unstructured
))
print(summary["content"])
print(summary["delivery"])  # "sync"

Webhook Mode (With `callback_url`)

Returns: Task ID immediately
Processing: Full structured summary (use_fast_path=False)
Delivery: POSTs result to your webhook URL
Retry: 3 attempts with exponential backoff
Use When: Long conversations, background processing, need auto-replacement

Parameters:

summary_request: ConversationSummaryRequest
- app_id: str (required)
- user_id: str (required)
- session_id: str (optional)
- messages: str | list (required)
- callback_url: str (optional) - If provided, enables webhook mode
- callback_headers: dict (optional) - Custom headers for webhook (e.g., auth)

response = await sdk.summarize_conversation(ConversationSummaryRequest(
    app_id="my-app-id-1234",
    user_id="user-123456",
    messages=[...],  # Long conversation
    callback_url="https://api.myapp.com/webhooks/summary",
    callback_headers={"Authorization": "Bearer token123"}
))
print(response["task_id"])
print(response["status"])  # "accepted"

Batching

Memory Batcher (`memory_batcher_add_message`)

Use Case: Streaming chat loops.

Why: Automatically buffers messages and calls add_memory when limit is reached. No manual counting.

How: Non-blocking. Monitors message count per (app_id, user_id, session_id) tuple. When it hits OMNIMEMORY_DEFAULT_MAX_MESSAGES (default 10), auto-flushes.

Parameters:

app_id: str (required)
user_id: str (required)
session_id: str (optional)
role: str (required) - "user", "assistant", "system"
content: str (required)

# In your chat loop
for message in stream:
    await sdk.memory_batcher_add_message(
        app_id="my-app-id-1234",
        user_id="user-123456",
        role=message.role,
        content=message.content
    )
    # SDK handles auto-flush at 10 messages

Evolution & Auditing

1. Traverse Evolution Chain (`traverse_memory_evolution_chain`)

Use Case: See how a memory evolved over time.

Why: Memories update, delete, merge. This traces the full history.

How: Follows next_id pointers from original to final memory.

Parameters:

app_id: str (required)
memory_id: str (required) - Starting memory ID

Returns: List of memories in chronological order

chain = await sdk.traverse_memory_evolution_chain(
    app_id="my-app-id-1234",
    memory_id="original-uuid-1234"
)
print(f"Memory evolved {len(chain)} times")
for memory in chain:
    print(f"{memory['metadata']['status']} - {memory['document'][:50]}")

2. Generate Evolution Graph (`generate_evolution_graph`)

Use Case: Visualize evolution chain.

Formats: mermaid, dot, html

Parameters:

chain: List[Dict] (required) - Output from traverse_memory_evolution_chain
format: str (required) - "mermaid", "dot", or "html"

chain = await sdk.traverse_memory_evolution_chain(...)

# Mermaid (for docs)
mermaid = sdk.generate_evolution_graph(chain, format="mermaid")
print(mermaid)

# HTML (for browser visualization)
html = sdk.generate_evolution_graph(chain, format="html")
with open("evolution.html", "w") as f:
    f.write(html)

3. Generate Evolution Report (`generate_evolution_report`)

Use Case: Detailed analysis of memory changes.

Formats: markdown, text, json

Parameters:

chain: List[Dict] (required)
format: str (required) - "markdown", "text", or "json"

report = sdk.generate_evolution_report(chain, format="markdown")
print(report)  # Includes stats, timeline, insights

System & Monitoring

Connection Pool Stats (`get_connection_pool_stats`)

Use Case: Production monitoring and debugging.

Why: If queries are slow, check if you're hitting connection limits.

Returns: Dict with pool metrics

stats = await sdk.get_connection_pool_stats()
print(f"Active: {stats['active_handlers']}/{stats['max_connections']}")
print(f"Available: {stats['available_handlers']}")

Configuration & Tuning

Tune these hyperparameters in your .env file to optimize for your specific use case.

Parameter	Default	Description	Tuning Guide
`OMNIMEMORY_RECALL_THRESHOLD`	`0.3`	Minimum cosine similarity for initial retrieval from Vector DB.	Lower to `0.2-0.25` for broader recall. Note: Typical good matches score 0.35-0.55, not 0.7+. Specific queries perform better than generic ones.
`OMNIMEMORY_COMPOSITE_SCORE_THRESHOLD`	`0.5`	Minimum final score (Relevance × Boosts) to return a memory.	Lower to `0.35-0.4` for more results. Composite scoring boosts base similarity with recency/importance, so a 0.45 similarity can become 0.60+ composite score.
`OMNIMEMORY_LINK_THRESHOLD`	`0.7`	Similarity required to "link" memories for conflict resolution.	Lower to `0.6` to trigger evolution/updates more often. Raise to `0.8` to reduce "noise" and only link very similar topics.
`OMNIMEMORY_DEFAULT_MAX_MESSAGES`	`10`	Number of messages required for `add_memory`.	Match this to your LLM's context window preference. Too low = poor synthesis; Too high = context bloat.
`OMNIMEMORY_VECTOR_DB_MAX_CONNECTIONS`	`10`	Max concurrent DB connections. Pool initializes with 50% of this value, then scales on-demand.	Reduce to `3-5` for low-resource environments (e.g., local dev). Increase to `20-30` for high-throughput production. Initial pool will be half of this value.

Environment Variables

Required Variables

Variable	Description	Example
`LLM_API_KEY`	LLM provider API key	`sk-...`
`LLM_PROVIDER`	LLM provider name	`openai`, `anthropic`, `mistral`
`EMBEDDING_API_KEY`	Embedding provider API key	`sk-...`
`EMBEDDING_PROVIDER`	Embedding provider	`openai`
`OMNI_MEMORY_PROVIDER`	Vector DB backend	`qdrant-remote`, `chromadb-remote`, `postgresql`, `mongodb`

LLM Configuration

Variable	Default	Description
`LLM_MODEL`	-	Model name (e.g., `gpt-4`, `claude-3-opus`)
`LLM_TEMPERATURE`	`0.4`	Creativity (0.0-2.0)
`LLM_MAX_TOKENS`	`3000`	Max response tokens
`LLM_TOP_P`	`0.9`	Nucleus sampling

Embedding Configuration

Variable	Default	Description
`EMBEDDING_MODEL`	-	Embedding model name
`EMBEDDING_DIMENSIONS`	-	Vector dimensions
`EMBEDDING_ENCODING_FORMAT`	`base64`	Response encoding format
`EMBEDDING_TIMEOUT`	`600`	Request timeout (seconds)

Vector Database

Qdrant:

QDRANT_HOST - Qdrant server host
QDRANT_PORT - Qdrant port (default 6333)

ChromaDB:

CHROMA_HOST - ChromaDB server host
CHROMA_PORT - ChromaDB port (default 8000)
CHROMA_AUTH_TOKEN - Authentication token
CHROMA_CLIENT_TYPE - Client type (remote for server)

PostgreSQL:

POSTGRES_URI - Full connection string (e.g., postgresql://user:pass@host:5432/db)

MongoDB:

MONGO_URI - MongoDB Atlas connection string

Observability

Variable	Default	Description
`OMNIMEMORY_ENABLE_METRICS_SERVER`	`false`	Enable Prometheus metrics endpoint
`OMNIMEMORY_METRICS_PORT`	`9001`	Metrics HTTP server port
`LOG_LEVEL`	`INFO`	Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)
`LOG_DIR`	`./logs`	Log file directory path

Production Deployment

Step 1: Prepare Environment Variables

Create a .env file with all required configuration:

# LLM Configuration (Required)
LLM_API_KEY=your-api-key-here
LLM_PROVIDER=openai
LLM_MODEL=gpt-4
LLM_TEMPERATURE=0.4
LLM_MAX_TOKENS=3000

# Embedding Configuration (Required)
EMBEDDING_API_KEY=your-api-key-here
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-large
EMBEDDING_DIMENSIONS=1536

# Vector Database (Choose one)
OMNI_MEMORY_PROVIDER=qdrant-remote
QDRANT_HOST=your-qdrant-host.com
QDRANT_PORT=6333

# OmniMemory Hyperparameters (Optional - tune for your use case)
OMNIMEMORY_DEFAULT_MAX_MESSAGES=10
OMNIMEMORY_RECALL_THRESHOLD=0.3
OMNIMEMORY_COMPOSITE_SCORE_THRESHOLD=0.4
OMNIMEMORY_LINK_THRESHOLD=0.8
OMNIMEMORY_VECTOR_DB_MAX_CONNECTIONS=10

# Metrics & Observability (Optional)
OMNIMEMORY_ENABLE_METRICS_SERVER=true
OMNIMEMORY_METRICS_PORT=9001
LOG_LEVEL=INFO

Step 2: Deploy with Docker Compose

# Start with your chosen backend
docker compose -f docker-compose.local.yml --profile qdrant up -d

Step 3: Production Hardening

⚠️ CRITICAL SECURITY WARNING: The provided docker-compose.local.yml is designed for local development only. For production deployments, you MUST implement the following security measures:

1. Enable HTTPS

Add a reverse proxy (nginx, Traefik, or Caddy) with SSL certificates:

# Add to your docker-compose
services:
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - /etc/letsencrypt:/etc/letsencrypt:ro
    depends_on:
      - api-qdrant

2. Implement Authentication

Use API keys in request headers
Implement OAuth 2.0 or JWT tokens
Configure authentication middleware in nginx/Traefik

3. Use Secrets Management

# Don't use .env files in production
# Use Docker secrets or cloud provider secrets management
docker secret create llm_api_key ./llm_key.txt
docker secret create embedding_api_key ./embedding_key.txt

4. Network Security

# Configure firewall to only expose necessary ports:
# - 443 (HTTPS) - public facing
# - 6333 (Qdrant) - internal network only
# - 9001 (Metrics) - internal network only

# Example: UFW firewall rules
sudo ufw allow 443/tcp
sudo ufw enable

5. Enable Monitoring

# Start with monitoring profile for Prometheus + Grafana
docker compose -f docker-compose.local.yml \
  --profile qdrant \
  --profile monitoring up -d

# Access Grafana at http://localhost:3000 (default: admin/admin)
# Access Prometheus at http://localhost:9090
# Configure alerts and dashboards for production monitoring

6. Backup & Disaster Recovery

Configure automated backups for your vector database
Test recovery procedures regularly
Use persistent volumes for data

Development & Testing

Running Tests

# Install development dependencies
uv sync

# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov=src/omnimemory --cov-report=html

# View coverage report
open htmlcov/index.html

Current Test Coverage: 89.58%

Running Locally

# Start vector database
docker compose -f docker-compose.local.yml --profile qdrant up -d

# Run API server in development mode
uv run uvicorn omnimemory.api.server:app --host 0.0.0.0 --port 8001 --reload

# Or use the provided script
python run_api_server.py

Architecture

OmniMemory implements the Self-Evolving Composite Memory Synthesis Architecture (SECMSA).

For comprehensive architecture documentation:

ARCHITECTURE.md - Deep dive into SECMSA, mathematical foundations, scoring algorithms, conflict resolution, and design decisions
C4_ARCHITECTURE.md - Visual system architecture with PlantUML diagrams:
- Level 1: System Context
- Level 2: Container Diagram
- Level 3: Component Diagram
- Level 4: Code Structure
- Sequence diagrams for memory creation and retrieval flows

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Clone: git clone https://github.com/omnirexflora-labs/omnimemory
Sync: uv sync --group dev
Test: uv run pytest

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Dec 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnimemory-0.0.1.tar.gz (121.2 kB view details)

Uploaded Dec 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnimemory-0.0.1-py3-none-any.whl (128.5 kB view details)

Uploaded Dec 1, 2025 Python 3

File details

Details for the file omnimemory-0.0.1.tar.gz.

File metadata

Download URL: omnimemory-0.0.1.tar.gz
Upload date: Dec 1, 2025
Size: 121.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for omnimemory-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`82f0ba547e12f5089fcd18b765150957b7cf9500cce7f0589851b559255c71df`
MD5	`2f46335691d384fd8c060d14aac0a8da`
BLAKE2b-256	`b3cc52c06b168d47d45a29a15138f56a64dc267948435cdca980975e6b2aab26`

See more details on using hashes here.

File details

Details for the file omnimemory-0.0.1-py3-none-any.whl.

File metadata

Download URL: omnimemory-0.0.1-py3-none-any.whl
Upload date: Dec 1, 2025
Size: 128.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for omnimemory-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0999b0db9a453f37284bcd3c4387d9e98417a543dbe73140c9d8079de59334e4`
MD5	`577fbf262632b40d875e73d322296c5c`
BLAKE2b-256	`34ff03ebaa7ff1daca0f24fb402bb19065142b5645fa7eaa0b611b2849647b4b`

See more details on using hashes here.

omnimemory 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

OmniMemory · The Living Brain for Autonomous Agents

Why OmniMemory?

Core Features

1. Dual-Agent Synthesis

2. Self-Evolving Memory

3. Composite Scoring

4. Enterprise Multi-Tenancy

Supported Backends

When to Use: API vs SDK

Use the REST API Server (Recommended for Production)

Use the Python SDK (Dev/Prototyping)

Quick Start

1. Install

2. Configure

3. Run (Choose Your Backend)

Qdrant (Production Default - High Performance)

ChromaDB (Simple Deployments)

PostgreSQL (Existing Postgres Users)

MongoDB (Configure MongoDB Atlas separately)

4. Use (Python SDK)

5. Quick Test Examples

Production Features

Agent Memory SDK

CLI Tool

SDK Usage Guide

Initialization

Core Memory Operations

1. Add Memory (add_memory)

2. Add Agent Memory (add_agent_memory)

3. Query Memory (query_memory)

4. Get Memory (get_memory)

5. Delete Memory (delete_memory)

Summarization

Summarize Conversation (summarize_conversation)

Sync Mode (No callback_url)

Webhook Mode (With callback_url)

Batching

Memory Batcher (memory_batcher_add_message)

Evolution & Auditing

1. Traverse Evolution Chain (traverse_memory_evolution_chain)

2. Generate Evolution Graph (generate_evolution_graph)

3. Generate Evolution Report (generate_evolution_report)

System & Monitoring

Connection Pool Stats (get_connection_pool_stats)

Configuration & Tuning

Environment Variables

Required Variables

LLM Configuration

Embedding Configuration

Vector Database

Observability

Production Deployment

Step 1: Prepare Environment Variables

Step 2: Deploy with Docker Compose

Step 3: Production Hardening

1. Enable HTTPS

2. Implement Authentication

3. Use Secrets Management

4. Network Security

5. Enable Monitoring

6. Backup & Disaster Recovery

Development & Testing

Running Tests

Running Locally

Architecture

Contributing

License

Project details

Verified details

Maintainers

1. Add Memory (`add_memory`)

2. Add Agent Memory (`add_agent_memory`)

3. Query Memory (`query_memory`)

4. Get Memory (`get_memory`)

5. Delete Memory (`delete_memory`)

Summarize Conversation (`summarize_conversation`)

Sync Mode (No `callback_url`)

Webhook Mode (With `callback_url`)

Memory Batcher (`memory_batcher_add_message`)

1. Traverse Evolution Chain (`traverse_memory_evolution_chain`)

2. Generate Evolution Graph (`generate_evolution_graph`)

3. Generate Evolution Report (`generate_evolution_report`)

Connection Pool Stats (`get_connection_pool_stats`)