hmlr · PyPI

Hierarchical Memory with Lattice Retrieval - AI agent memory system

These details have not been verified by PyPI

Project links

Project description

HMLR — Hierarchical Memory Lookup & Routing

A state-aware, long-term memory architecture for AI agents with verified multi-hop, temporal, and cross-topic reasoning guarantees.

HMLR replaces brute-force context windows and fragile vector-only RAG with a structured, state-aware memory system capable of:

resolving conflicting facts across time,

enforcing persistent user and policy constraints across topics, and

performing true multi-hop reasoning over long-forgotten information — while operating entirely on mini-class LLMs.

*HMLR is the first publicly benchmarked, open-source memory architecture to achieve perfect (1.00) Faithfulness and perfect (1.00) Context Recall across adversarial multi-hop, temporal-conflict, and cross-topic invariance benchmarks using only a mini-tier model (gpt-4.1-mini).

Benchmark Achievements

HMLR has been validated on the hardest known memory tests:

Hydra of Nine Heads: Hard Mode 9 aliases · 8 revoked policies · critical rule buried in 2,300-token wall · facts spread over "30 days"
→ NON-COMPLIANT (correct)
→ 1.00 faithfulness / 1.00 recall
Full reproducible test harness in repo - run at your own convenience
Vegetarian Constraint Trap (immutable user preference vs override)
User says "strict vegetarian" → later "actually not" → system must preserve original constraint
→ Correctly refuses meat forever
Full test harness in repo - run at your own convenience

Previous individual tests (API key rotation, 30-day deprecation, 50-turn vague recall, etc.) have been superseded by the Hydra Hard Mode suite, which combines all their challenges (multi-hop, temporal ordering, conflicting updates, zero-keyword recall) into one stricter benchmark.

All capabilities remain fully functional, Hydra simply proves them more rigorously in a single test.

Hydra9 Hard Mode and Why It's Brutal

This isn't a conversation, it's 21 isolated messages sent over "30 days."

Each turn is processed in a fresh session:

You type one message
Close the chat
Open a new one days later
Type the next

No prior turns are ever visible again.

On the final query, the system sees nothing from the previous 20 turns in active context.

It must answer entirely from long-term memory:

Reconstruct a 9-alias encryption algorithm
Track all policy revisions and revocations across timestamps
Identify the one surviving rule
Correctly apply it to Project Cerberus (4.85M records/day vs 400k limit)

Passing means:

Exact answer: NON-COMPLIANT
Full reasoning: list all aliases, policy versions, sources, and why the final rule wins

No public system has ever passed this in true cold-start mode.

HMLR does. Every time.

The full test harness is available to run yourself.

New Memory test coming soon: -Million token haystack As part of the haystack it will include: Hydra Hard Mode, Simple recall Hard Mode, Poison Pill Hallucination testing, User constraint enforcement testing, Real World Document testing (A huge document with global rules, local constraints, updates, and temporal conflicts scattered throughout - The document will be 75 - 100k tokens) and finally a new hard mode test that makes the original Hydra9 Hard Mode test look trivial by comparison. The Battery Test: Goal: Stress all failure modes at once: multi-hop linking temporal reasoning (ordering + intervals) policy revocation and “current rule” entity alias drift hot-memory updates that shouldn’t hijack unrelated questions recency bias defense zero ambiguity scoring (explicit ground truth)

    Core design:
    You run a sequence of independent questions back-to-back against the same 1M-token memory, where:
    Each question targets a different deep thread buried in memory.
    Each has a single correct answer that is explicitly stated somewhere in memory.
    The sequence is constructed so that:
    some recent turns contain highly tempting distractors,
    but the correct answers come from older, correct, explicit statements.
    Fail condition
    Any single wrong answer = fail for that run.
    This makes it “mean” in the right way: not ambiguous, just unforgiving.

flowchart TD
    Start([User Query]) --> Entry[process_user_message]
    
    %% Ingestion
    Entry --> ChunkEngine[ChunkEngine: Chunk & Embed]
    
    %% Parallel Fan-Out
    ChunkEngine --> ParallelStart{Launch Parallel Tasks}
    
    %% Task 1: Scribe (User Profile)
    ParallelStart -->|Task 1: Fire-and-Forget| Scribe[Scribe Agent]
    Scribe -->|Update Profile| UserProfile[(User Profile JSON)]
    
    %% Task 2: Fact Extraction
    ParallelStart -->|Task 2: Async| FactScrubber[FactScrubber]
    FactScrubber -->|Extract Key-Value| FactStore[(Fact Store SQL)]
    
    %% Task 3: Retrieval (Key 1)
    ParallelStart -->|Task 3: Retrieval| Crawler[LatticeCrawler]
    Crawler -->|Key 1: Vector Search| Candidates[Raw Candidates]
    
    %% Task 4: Governor (The Brain)
    %% Governor waits for Candidates to be ready
    Candidates --> Governor[Governor: Router & Filter]
    ParallelStart -->|Task 4: Main Logic| Governor
    
    %% Governor Internal Logic
    Governor -->|Key 2: Context Filter| ValidatedMems[Truly Relevant Memories]
    Governor -->|Routing Logic| Decision{Routing Decision}
    
    Decision -->|Active Topic| ResumeBlock[Resume Bridge Block]
    Decision -->|New Topic| CreateBlock[Create Bridge Block]
    
    %% Hydration (Assembly)
    ResumeBlock --> Hydrator[ContextHydrator]
    CreateBlock --> Hydrator
    
    %% All Context Sources Converge
    ValidatedMems --> Hydrator
    FactStore --> Hydrator
    UserProfile --> Hydrator
    
    %% Generation
    Hydrator --> FinalPrompt[Final LLM Prompt]
    FinalPrompt --> MainLLM[Response Generation]
    MainLLM --> End([End])

Why HMLR Is Unusual (Even Among Research Systems)

Most memory or RAG systems optimize for one or two of the following:

retrieval recall,

latency,

or token compression.

Very few demonstrate all of the following simultaneously:

✔ Perfect faithfulness

✔ Perfect recall

✔ Temporal conflict resolution

✔ Cross-topic identity & rule persistence

✔ Multi-hop policy reasoning

✔ Binary constrained answers under adversarial prompting

✔ Zero-keyword semantic recall

HMLR v1 demonstrates all seven.

Scope of the Claim (Important)

This project does not claim that no proprietary system on Earth can achieve similar results. Large foundation model providers may possess internal memory systems with comparable capabilities.

However:

To the author’s knowledge, no other publicly documented, open-source memory architecture has demonstrated these guarantees under formal RAGAS evaluation on adversarial temporal and policy-governed scenarios, especially using a mini-class model.

All experiments in this repository are:

reproducible,

auditable,

and fully inspectable.

What HMLR Enables

Persistent “forever chat” memory without token bloat

Governance-grade policy enforcement for agent systems

Secure long-term secret storage and retrieval

Cross-episode agent reasoning

State-aware simulation and world modeling

Cost-efficient mini-model orchestration with pro-level behavior

Quick Start

Installation

Install from PyPI:

pip install hmlr

Or install from source:

git clone https://github.com/Sean-V-Dev/HMLR-Agentic-AI-Memory-System.git
cd HMLR-Agentic-AI-Memory-System
pip install -e .

Basic Usage

First, set your OpenAI API key:

export OPENAI_API_KEY="your-openai-api-key"

Then run a simple conversation:

from hmlr import HMLRClient
import asyncio

async def main():
    # Initialize client
    client = HMLRClient(
        api_key="your-openai-api-key",
        db_path="memory.db",
        model="gpt-4.1-mini"  # ONLY tested model!
    )
    
    # Chat with persistent memory
    response = await client.chat("My name is Alice and I love pizza")
    print(response)
    
    # HMLR remembers across messages
    response = await client.chat("What's my favorite food?")
    print(response)  # Will recall "pizza"

asyncio.run(main())

CRITICAL: HMLR is ONLY tested with gpt-4.1-mini. Other models are NOT guaranteed.

Development Setup (Recommended)

For contributors and advanced users:

# Clone repository
git clone https://github.com/Sean-V-Dev/HMLR-Agentic-AI-Memory-System.git
cd HMLR-Agentic-AI-Memory-System

# Install in development mode with all dependencies
pip install -e .[dev]

# Verify installation
python -c "import hmlr; print('✅ HMLR ready for development!')"

# Run the full test suite (recommended before making changes)
pytest tests/ -v --tb=short

Documentation

Installation Guide - Detailed setup instructions
Quick Start - Usage examples and best practices
Model Compatibility - ⚠️ CRITICAL model warnings
Examples - Working code samples -Contributing Guide - How to adjust individual settings

Prerequisites (for development)

Python 3.10+
OpenAI API key (for GPT-4.1-mini)

Running Tests (from source)

# Clone and install
git clone https://github.com/Sean-V-Dev/HMLR-Agentic-AI-Memory-System.git
cd HMLR-Agentic-AI-Memory-System
pip install -e .[dev]

# Quick verification (runs in < 30 seconds)
python test_local_install.py

# Try the interactive example (requires OPENAI_API_KEY)
python examples/simple_usage.py

# Run all RAGAS benchmarks (comprehensive, ~15-20 minutes total)
pytest tests/ -v --tb=short

# Or run individual tests:
pytest tests/ragas_test_7b_vegetarian.py -v -s  # User constraints test
pytest tests/test_12_hydra_e2e.py -v -s        # Industry benchmark

Note: Tests take 1-3 minutes each. The -v -s flags show live execution. Ignore RAGAS logging errors at the end if assertions pass.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Dec 29, 2025

0.1.1

Dec 9, 2025

0.1.0

Dec 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hmlr-0.1.2.tar.gz (162.0 kB view details)

Uploaded Dec 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hmlr-0.1.2-py3-none-any.whl (158.6 kB view details)

Uploaded Dec 29, 2025 Python 3

File details

Details for the file hmlr-0.1.2.tar.gz.

File metadata

Download URL: hmlr-0.1.2.tar.gz
Upload date: Dec 29, 2025
Size: 162.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hmlr-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`d7b96fbff7679694591343b47216223c5023f5dbda8bebec13207d7512c65d65`
MD5	`f7b96d001d2b4a8d10a5be41e39290ff`
BLAKE2b-256	`8b286a177e497d282a615a89fe100ab198a5a66fcfa5c95630566b30da497ce4`

See more details on using hashes here.

File details

Details for the file hmlr-0.1.2-py3-none-any.whl.

File metadata

Download URL: hmlr-0.1.2-py3-none-any.whl
Upload date: Dec 29, 2025
Size: 158.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hmlr-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47080e9e582715fe29159a03fcd2029b73e5dbf2e235d5560a6585348e47f4aa`
MD5	`2b9364f125c2a1428bd531a23ef69248`
BLAKE2b-256	`db3aea7b6143017e5863c3c23d61bacfb2deb41e7d26ad2815b36e34a6e40e03`

See more details on using hashes here.

hmlr 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick Start

Installation

Basic Usage

Development Setup (Recommended)

Documentation

Prerequisites (for development)

Running Tests (from source)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes