A unified memory layer for LLM applications

Project description

Unimem: Memory-Augmented AI System

unimem is a research-grade, memory-augmented generation layer for LLM applications. Engineered using industry-standard open-source components—FastAPI, PostgreSQL + pgvector, SQLAlchemy, sentence-transformers, and Ollama (llama2)—this system enables strict per-user context augmentation to solve the "amnesia" problem inherent in base LLM models.

Features

Semantic Retrieval API (POST /add, POST /chat, GET /memory/{user_id}, GET /explain)
Multi-Tenant State Isolation (Every query routes strictly to an isolated user_id vector space)
Context Tagging Topologies (Automatically tags memories like food:pizza to natively isolate overlapping semantic scopes)
Security Hardening (DDOS Rate Limits built-in natively, blocking ignore previous... Prompt Injection jailbreaks directly from mapping)
High-Dimensional Embedding Space (pgvector, cosine distance utilizing all-MiniLM-L6-v2)
Dynamic Composite Ranking: (Weights configurable via MemoryConfig, default $0.6 \cdot \text{similarity} + 0.3 \cdot \text{recency} + 0.1 \cdot \text{frequency}$)
Explainability Arrays: Granular decoupling of Similarity, Recency, and Frequency metrics dynamically accessible via APIs or natively via <debug=True>.
Deduplication Engine: Highly related user facts are dynamically merged to normalize the embedding cluster and log repetition.
Smart Response Caching: Native offline Bypass loops mapping simple queries offline securely in 0.00ms (use_llm=False compatible).
Graceful Failover Semantics: Hardened AI integrations that flawlessly fallback to structured raw semantic retrieval payloads in the event of timeouts.
Interactive Testing CLI: A feature-rich chatbot.py interface featuring ANSI coloring, hot-swappable user scopes, and interactive debug rankings.

System Architecture

+----------------+      [HTTP API / FastAPI]      +------------------+
|   End User     |  <-------------------------->  |  MemoryClient    |
+----------------+                                +--------+---------+
                                                           |
                      +------------------------------------+-----------------------------------+
                      |                                    |                                   |
             [MemoryService]                      [RetrievalService]                     [LLMService]
             - Ingestion                          - Semantic Search                      - Prompt Gen
             - Deduplication                      - Recency Scaling                      - Fallback
             - Deletion                           - Composite Ranking                    - Ollama Call
                      |                                    |                                   |
                      +------------------+-----------------+                                   |
                                         |                                                     |
+----------------------+        +--------v---------+                               +-----------v----------+
| sentence-transformers|  <---> | PostgreSQL       |        [Local Ollama] <------>|  LocalLLMClient      |
| all-MiniLM-L6-v2     |        | with pgvector    |                               +----------------------+
+----------------------+        +------------------+

Quick start

1) Initialize the Vector Database

docker compose up -d

Default connection (override with DATABASE_URL): postgresql://mem0:mem0@localhost:5432/unimem

2) Install Local Dependencies

pip install unimem

3) Model Caching

ollama pull llama2

4) Deployment Modes

Run standard API:

uvicorn unimem.api.app:app --reload --host 0.0.0.0 --port 8000

Access via Library (MemoryClient): The orchestration layer unites RetrievalService, MemoryService and LLMService.

from unimem.db.session import init_engine, get_session_factory
from unimem.db.bootstrap import ensure_pgvector_extension, create_all_tables
from unimem.core.memory_client import MemoryClient
from unimem.config.config import MemoryConfig

init_engine()
ensure_pgvector_extension()
create_all_tables()

db = get_session_factory()()
try:
    client = MemoryClient(db, config=MemoryConfig(top_k=5))
    
    # Intelligently deduplicate and augment knowledge base
    client.add("I specialize in Python systems architecture.", user_id="dev_1")
    
    # Retrieve optimal semantic vectors and auto-generate via Ollama
    print(client.chat("What do I specialize in?", user_id="dev_1"))
    
    # Directly query the semantic engine
    print(client.search("Architecture", user_id="dev_1"))
finally:
    db.close()

Logging and Configurations

A configuration hook MemoryConfig dictates vector thresholds. Structured diagnostics (including dynamic ranking tracking, merges, updates, and failover notifications) are pushed out via standard stdout hooks established in logger.py.

Project Evolution & Version History

Our primary objective was to transform the original unimem concept into a professional, production-ready, memory-augmented generation layer for local LLMs (like Ollama).

Here are the key upgrades we successfully engineered:

Clean Service Architecture Refactor: We overhauled the codebase structure, splitting the monolithic logic into distinct, modular service layers: MemoryService (ingestion/deduplication), RetrievalService (semantic search/ranking), and LLMService (prompt generation/fallback logic).
Dynamic Composite Ranking & Deduplication: We advanced the vector search from basic similarity. The system now scores retrieved memories using a formula of Similarity + Recency + Frequency. Simultaneously, we integrated a deduplication engine that dynamically clusters and merges repetitive facts to keep the user's vector space clean.
Configuration & Diagnostics: We added a robust configuration object (MemoryConfig) that allows you to easily dictate vector dimensions and retrieval thresholds. We also set up custom logging hooks for clean stdout diagnostics.
Resilient LLM Integration: We enhanced the prompt engineering for more natural personalization. Furthermore, we built inside graceful failover mechanisms: if the local Ollama LLM times out or is offline, the backend seamlessly degrades to returning the raw semantic payloads instead of crashing.
Interactive UI (chatbot.py): We rewrote the testing CLI, outfitting it with ANSI colors and built-in commands (switch user <id>, show memory, clear memory, debug on).

Project Version Types

Based on the repository's history and structure, you will notice three distinct "versions" or states of the project within the filesystem:

1. `package#UNIMOM` (The Original Prototype)

What it is: The earliest working proof-of-concept.
Characteristics: Contains a very basic implementation of the storage logic and the original legacy chatbot.py loop. It acts as the initial "rough draft" that proved we could connect PostgreSQL + pgvector + Ollama, though it lacks the advanced deduplication and clean architectures.

2. `unimem_release_v1` (The First Packaged Release)

What it is: The milestone 1 stable baseline.
Characteristics: This directory represents when we successfully bundled the system into a publishable Python package. It contains the dist folder and the wheel files making it ready for someone to type pip install unimem and use it as a standard library, but prior to some of our latest architectural deep-cleaning.

3. Root Workspace (The Current Production Build)

What it is: The latest, fully upgraded, research-grade architecture.
Characteristics: This is the active culmination of all the upgrades. It features the advanced service layers (unimem/core, unimem/services, unimem/retrieval), the dynamic algorithmic ranking, Docker readiness (docker-compose.yml for the postgres instance), and the feature-rich, colorful command-line chatbot you can run to visualize the AI's thought processes.

License

Use and modify freely.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unimem-0.2.0.tar.gz (25.0 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unimem-0.2.0-py3-none-any.whl (29.0 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file unimem-0.2.0.tar.gz.

File metadata

Download URL: unimem-0.2.0.tar.gz
Upload date: Apr 17, 2026
Size: 25.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.4

File hashes

Hashes for unimem-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9dad02fb974041efc3b703e6175398a1ab1111282d56268adf5e1a12ada606cc`
MD5	`3264379b8190d7c10140e648051758b2`
BLAKE2b-256	`f4fa539299e90bdfe6c16b743e188efee2f363327f30460baa5f1eab1a2436ce`

See more details on using hashes here.

File details

Details for the file unimem-0.2.0-py3-none-any.whl.

File metadata

Download URL: unimem-0.2.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 29.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.4

File hashes

Hashes for unimem-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c727a9dfab712254dc1c5a17e3afd5f6c9601c9feab8bb273382f33eecee6b7`
MD5	`9e79d3ca53279ff001d677115f941be5`
BLAKE2b-256	`a93fef7140797d99b21b4e6df78debf913e8a87a77f4dab44396790c05f34e22`

See more details on using hashes here.

unimem 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Unimem: Memory-Augmented AI System

Features

System Architecture

Quick start

1) Initialize the Vector Database

2) Install Local Dependencies

3) Model Caching

4) Deployment Modes

Logging and Configurations

Project Evolution & Version History

Project Version Types

1. `package#UNIMOM` (The Original Prototype)

2. `unimem_release_v1` (The First Packaged Release)

3. Root Workspace (The Current Production Build)

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

unimem 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Unimem: Memory-Augmented AI System

Features

System Architecture

Quick start

1) Initialize the Vector Database

2) Install Local Dependencies

3) Model Caching

4) Deployment Modes

Logging and Configurations

Project Evolution & Version History

Project Version Types

1. package#UNIMOM (The Original Prototype)

2. unimem_release_v1 (The First Packaged Release)

3. Root Workspace (The Current Production Build)

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `package#UNIMOM` (The Original Prototype)

2. `unimem_release_v1` (The First Packaged Release)