Skip to main content

A self-consolidating memory layer for AI agents with schema-first design, intelligent merging, and hybrid search capabilities

Project description

🧠 Ontomem: The Self-Consolidating Memory Layer

中文版本 | English

Give your AI agent a "coherent" memory, not just "fragmented" retrieval.

Python 3.11+ License: MIT

Traditional RAG (Retrieval-Augmented Generation) systems retrieve text fragments. Ontomem maintains structured entities using Pydantic schemas and intelligent merging algorithms. It automatically consolidates fragmented observations into complete knowledge graph nodes.

It doesn't just store data—it continuously "digests" and "organizes" it.


✨ Why Ontomem?

🧩 Schema-First & Type-Safe

Built on Pydantic. All memories are strongly-typed objects. Say goodbye to {"unknown": "dict"} hell and embrace IDE autocomplete and type checking.

🔄 Auto-Consolidation

When you insert different pieces of information about the same entity (same ID) multiple times, Ontomem doesn't create duplicates. It intelligently merges them into a Golden Record using configurable strategies (field overrides, list merging, or LLM-powered intelligent fusion).

🔍 Hybrid Search

  • Key-Value Lookup: O(1) exact entity access
  • Vector Search: Built-in FAISS indexing for semantic similarity search, automatically synced

💾 Stateful & Persistent

Save your complete memory state (structured data + vector indices) to disk and restore it in seconds on next startup.


🚀 Quick Start: Building a "Self-Improving" Experience Library

Imagine an AI coding agent that debugs issues. Without memory, it repeats the same trial-and-error process every time. With Ontomem, it builds a persistent "Debugging Playbook" that evolves with each new problem encountered.

1. Define Your Experience Schema

from pydantic import BaseModel
from typing import List, Optional

class BugFixExperience(BaseModel):
    """A living record of debugging knowledge."""
    error_signature: str            # Key: e.g., "ModuleNotFoundError: pandas"
    root_causes: List[str]          # Different reasons this error can occur
    solutions: List[str]            # Multiple working solutions discovered
    prevention_tips: str            # Synthesized understanding of how to avoid it
    last_updated: Optional[str] = None

2. Initialize with LLM-Powered Merging

We use the LLM.BALANCED strategy so Ontomem doesn't just list solutions—it synthesizes them into coherent, actionable guidance.

from ontomem import OMem, MergeStrategy
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

experience_memory = OMem(
    memory_schema=BugFixExperience,
    key_extractor=lambda x: x.error_signature,
    llm_client=ChatOpenAI(model="gpt-4o"),
    embedder=OpenAIEmbeddings(),
    merge_strategy=MergeStrategy.LLM.BALANCED
)

3. The Agent Learns Over Time

Day 1: The First Encounter

The agent encounters ModuleNotFoundError for pandas and fixes it with pip install.

# Experience 1: Initial observation
experience_memory.add(BugFixExperience(
    error_signature="ModuleNotFoundError: No module named 'pandas'",
    root_causes=["Missing library in environment"],
    solutions=["Run: pip install pandas"],
    prevention_tips="Always check requirements.txt before running code."
))

Day 2: New Context, Different Fix

The agent encounters the same error in a Docker container where pip fails, but apt-get install python3-pandas works.

# Experience 2: Different context, same error
experience_memory.add(BugFixExperience(
    error_signature="ModuleNotFoundError: No module named 'pandas'",
    root_causes=["Package not in system Python", "Binary incompatibility with pip"],
    solutions=["Run: apt-get install python3-pandas", "Use system package manager in containers"],
    prevention_tips="In containerized environments, prefer system packages for compiled dependencies."
))

Day 3: Agent Seeks Wisdom

When a new agent instance encounters the same error, it queries the evolved knowledge base:

# Retrieve consolidated wisdom
guidance = experience_memory.get("ModuleNotFoundError: No module named 'pandas'")

print("Root Causes:")
for cause in guidance.root_causes:
    print(f"  - {cause}")
# Output:
#   - Missing library in environment
#   - Package not in system Python
#   - Binary incompatibility with pip

print("\nSolutions:")
for i, solution in enumerate(guidance.solutions, 1):
    print(f"  {i}. {solution}")
# Output:
#   1. Run: pip install pandas (standard approach)
#   2. Run: apt-get install python3-pandas (for system Python)
#   3. Use system package manager in containers

print("\nPrevention Tips:")
print(guidance.prevention_tips)
# Output: "Check requirements.txt before running code. 
#         In containers, prefer system packages for compiled dependencies.
#         Consider using virtual environments to isolate dependencies."

Day 4: Semantic Search for Similar Problems

The agent doesn't remember the exact error, but can search by concept:

# Semantic search: Find solutions for import-related issues
similar_issues = experience_memory.search(
    "Python module import failures dependency missing",
    k=5
)

print(f"Found {len(similar_issues)} related debugging experiences")

The agent went from "trial and error" to "informed decision-making". No boilerplate. No manual consolidation. Just add experiences and let Ontomem synthesize wisdom.


🔍 Semantic Search

Build an index and search by natural language:

# Build vector index
memory.build_index()

# Semantic search
results = memory.search("Find researchers working on transformer models and attention mechanisms")

for researcher in results:
    print(f"- {researcher.name}: {researcher.research_interests}")

🛠️ Merge Strategies

Choose how to handle conflicts:

Strategy Behavior Use Case
FIELD_MERGE Non-null overwrites, lists append Simple attribute collection
KEEP_NEW Latest data wins Status updates (current role, last seen)
KEEP_OLD First observation stays Historical records (first publication year)
LLM.BALANCED LLM-driven semantic merging Complex synthesis, contradiction resolution
# Example: LLM intelligently merges conflicting bios
memory = OMem(
    ...,
    merge_strategy=MergeStrategy.LLM.BALANCED
)

💾 Save & Load

Snapshot your entire memory state:

# Save (structured data → memory.json, vectors → FAISS indices)
memory.dump("./researcher_knowledge")

# Later, restore instantly
new_memory = OMem(...)
new_memory.load("./researcher_knowledge")

📊 Ontomem vs Traditional Approaches

Feature Traditional Vector DB Ontomem 🧠
Storage Unit Text chunks Structured Objects
Deduplication Manual or via embeddings Native, ID-based
Updates Append-only (creates dupes) Auto-merge (upsert)
Query Results Similar text fragments Complete entities
Type Safety ❌ None Pydantic
Indexing Manual sync needed Auto-synced

🎯 Use Cases

🤖 AI Research Assistant

Consolidate researcher profiles, papers, and citations from multiple sources.

👤 Personal Knowledge Graph

Build a living profile of contacts, their preferences, skills, and interaction history from conversations.

🏢 Enterprise Data Hub

Unify customer/employee records from CRM, email, support tickets, and social media.

🧠 AI Agent Long-Term Memory

An autonomous agent accumulates experiences and observations—Ontomem keeps them organized and searchable.


🔧 Installation

pip install ontomem

Or with uv:

uv add ontomem

Requirements:

  • Python 3.11+
  • LangChain (for LLM integration)
  • Pydantic (for schema definition)
  • FAISS (for vector search)

📚 API Reference

Core Methods

add(items: Union[T, List[T]]) → None

Add item(s) to memory. Automatically merges duplicates by key.

memory.add(ResearcherProfile(...))
memory.add([item1, item2, item3])

get(key: Any) → Optional[T]

Retrieve an entity by its unique key.

researcher = memory.get("yann_lecun_001")

build_index(force: bool = False) → None

Build or rebuild the vector index for semantic search.

memory.build_index()  # Build if clean
memory.build_index(force=True)  # Force rebuild

search(query: str, k: int = 5) → List[T]

Semantic search over all entities.

results = memory.search("transformers and attention", k=10)

dump(folder_path: Union[str, Path]) → None

Save memory state (data + index) to disk.

memory.dump("./my_memory")

load(folder_path: Union[str, Path]) → None

Load memory state from disk.

memory.load("./my_memory")

remove(key: Any) → bool

Remove an entity by key.

success = memory.remove("yann_lecun_001")

clear() → None

Clear all entities and indices.

memory.clear()

Properties

keys: List[Any]

All unique keys in memory.

items: List[T]

All entity instances.

size: int

Number of entities.


🤝 Contributing

We're building the next generation of AI memory standards. PRs and issues welcome!


📝 License

MIT License - See LICENSE file for details.


Built with ❤️ for AI developers who believe memory is more than just search.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontomem-0.1.0.tar.gz (221.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontomem-0.1.0-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file ontomem-0.1.0.tar.gz.

File metadata

  • Download URL: ontomem-0.1.0.tar.gz
  • Upload date:
  • Size: 221.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for ontomem-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8ab9abe0bd29442710063c29945a89cc7b5e13f105667c5999a95b5c16ac5bc8
MD5 7e9af2c0a647fe775425c6d4589e83cc
BLAKE2b-256 9e8dabb907ddad342cabe0ea8d54c5d54c2761336fc7638c35c42301adf8e442

See more details on using hashes here.

File details

Details for the file ontomem-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ontomem-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for ontomem-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 adcff690a0e1e4bbb574859bb0d62e1c805d33f8d31bca05404200b5455dd21c
MD5 e42308474649516fd0b8db4c6a66def5
BLAKE2b-256 0c61b9f4563c01030b0e1a953fea3abb5f1f8bd31705b5dcd8e6c8e755074023

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page