Neuro-inspired memory system for LLMs (server + Python SDK)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

cognitive-memory-layer

Python SDK for the Cognitive Memory Layer — neuro-inspired memory for AI applications. Store, retrieve, and reason over memories with sync/async clients or an in-process embedded engine.

The Cognitive Memory Layer (CML) gives LLMs a neuro-inspired memory system: episodic and semantic storage, consolidation, and active forgetting. It fits into agents, RAG pipelines, and personalized apps as a persistent, queryable memory backend. This SDK provides sync and async HTTP clients for a CML server, plus an optional in-process embedded engine (lite mode: SQLite and local embeddings, no server). You get write/read/turn, sessions, admin and batch operations, and a helper for OpenAI chat.

Who it's for: Developers building AI applications that need persistent, queryable memory — chatbots, agents, evaluation pipelines, and personalized assistants.

What you can do:

Power agent loops with retrieved context and store observations in memory.
Add memory to RAG pipelines so retrieval is informed by prior interactions.
Personalize by user or session with namespaces and session-scoped context.
Run benchmarks with eval mode and temporal fidelity (historical timestamps). For bulk evaluation, the server supports LLM_INTERNAL__* and the eval script supports --ingestion-workers; see configuration.
Run embedded without a server for development, demos, or single-machine apps.

What's new (1.5.0): automatic model download from HuggingFace Hub on first startup, write-gate importance blending with importance_bin/salience_bin models, fact_type model gating for named heuristic families, Docker entrypoint with model bootstrap, circular import fix. See CHANGELOG.

Previous (1.4.x): session-scoped write route support in SessionScope/AsyncSessionScope, new dashboard/admin helpers (dashboard_facts, dashboard_invalidate_fact, dashboard_export_memories, graph_overview, admin_consolidate, admin_forget), embedded provider parity fixes, and wrapper parity updates for user_timezone/timestamp.

Installation

pip install cognitive-memory-layer

Embedded mode (run the CML engine in-process, no server). In lite mode, only the episodic (vector) store is used; the neocortical (graph/semantic) store is disabled, so there is no knowledge graph or semantic consolidation. Best for development, demos, or single-machine apps.

pip install cognitive-memory-layer[embedded]

From the monorepo, the server and SDK are built from the repository root (single pyproject.toml). Install in editable mode with optional extras:

# From repo root: install SDK only
pip install -e .

# From repo root: install server + SDK
pip install -e ".[server,dev]"

# From repo root: install SDK with embedded mode (in-process engine)
pip install -e ".[embedded]"

Quick start

Sync client — Connect to a CML server, write a memory, read by query, and run a turn with a session; use result.context for LLM injection and result.memories (or result.constraints when the server returns them) for structured access.

from cml import CognitiveMemoryLayer

with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
    memory.write("User prefers vegetarian food.")
    result = memory.read("What does the user eat?")
    print(result.context)  # Formatted for LLM injection
    for m in result.memories:
        print(m.text, m.relevance)
    turn = memory.turn(user_message="What should I eat tonight?", session_id="session-001")
    print(turn.memory_context)

Async client — Same flow as sync; use async with and await for all operations.

import asyncio
from cml import AsyncCognitiveMemoryLayer

async def main():
    async with AsyncCognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
        await memory.write("User prefers dark mode.")
        result = await memory.read("user preferences")
        print(result.context)

asyncio.run(main())

Embedded mode — No server: SQLite plus local embeddings (lite mode). Use db_path for persistence.

import asyncio
from cml import EmbeddedCognitiveMemoryLayer

async def main():
    async with EmbeddedCognitiveMemoryLayer() as memory:
        await memory.write("User prefers vegetarian food.")
        result = await memory.read("dietary preferences")
        print(result.context)

asyncio.run(main())
# Persistent: EmbeddedCognitiveMemoryLayer(db_path="./my_memories.db")

Get context for injection — Use get_context(query) when you only need a formatted string for the LLM:

with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
    context = memory.get_context("user preferences")
    # Inject context into your system prompt or RAG pipeline

Session-scoped flow — Use memory.session(name="...") to scope writes and reads to a session:

with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
    with memory.session(name="session-001") as session:
        session.write("User asked about Italian food.")
        session.read("What did I ask earlier?")
        session.turn(user_message="Any good places nearby?", assistant_response="...")

SessionScope.write()/AsyncSessionScope.write() call /session/{session_id}/write, and SessionScope.read()/AsyncSessionScope.read() call /session/{session_id}/read, so session wrappers stay path-scoped on both write and read.

More usage: Timezone-aware retrieval with read(..., user_timezone="America/New_York") or turn(..., user_timezone="America/New_York"). Batch operations: batch_write([{"content": "..."}, ...]) and batch_read(["query1", "query2"]) for multiple writes or reads.

Configuration

Client: Environment variables (use .env or set directly): CML_API_KEY, CML_BASE_URL, CML_TENANT_ID, CML_TIMEOUT, CML_MAX_RETRIES, CML_ADMIN_API_KEY, CML_VERIFY_SSL. Use CMLConfig for validated, reusable config. See Configuration.

Constructor:

memory = CognitiveMemoryLayer(
    api_key="sk-...",
    base_url="http://localhost:8000",
    tenant_id="my-tenant",
)

Or pass a config object: from cml import CMLConfig then CognitiveMemoryLayer(config=config).

Embedded: Use EmbeddedConfig (or constructor args). Options: storage_mode (lite | standard | full; only lite is implemented), tenant_id, database, embedding, llm, auto_consolidate, auto_forget. Embedding and LLM are read from .env when not set: EMBEDDING_INTERNAL__PROVIDER, EMBEDDING_INTERNAL__MODEL, EMBEDDING_INTERNAL__DIMENSIONS, EMBEDDING_INTERNAL__BASE_URL, LLM_INTERNAL__MODEL, LLM_INTERNAL__BASE_URL. Lite mode uses SQLite and local embeddings; pass db_path for a persistent database. Full details in Configuration.

Features

Mode	Description
Client	Sync and async HTTP clients for a running CML server; context managers
Embedded	In-process engine (lite mode: SQLite + local embeddings); no server. Embedded `read()` passes `memory_types`, `since`, and `until` to the orchestrator.

Memory API: write, read, read_stream, read_safe, turn, update, forget, stats, get_context, create_session, get_session_context, delete_all, remember (alias for write), search (alias for read), health. Options: user_timezone on read(), get_context(), search(), and turn() for timezone-aware "today"/"yesterday"; timestamp on write(), turn(), and remember() for event time; eval_mode on write()/remember() for benchmark responses. Write supports context_tags, session_id, memory_type, namespace, metadata, agent_id. Read supports memory_types, since, until, response_format (packet | list | llm_context).

Response shape: ReadResponse has memories, facts, preferences, episodes, constraints (when the server has constraint extraction), and context (formatted string for LLM injection).

Server compatibility: The server supports delete_all (admin API key), read filters and user_timezone, response formats, write metadata and memory_type, and session-scoped context. Read filters and user_timezone are sent when the server supports them. The server can use LLM-based extraction (constraints, facts, salience, importance) when FEATURES__USE_LLM_* flags are enabled; see UsageDocumentation § Configuration Reference.

Session and namespace: memory.session(name=...) (SessionScope) scopes writes/reads/turns to a session via session-scoped routes. with_namespace(namespace) returns a NamespacedClient (and async AsyncNamespacedClient) that injects namespace into write, update, and batch_write, and forwards user_timezone/timestamp on read/turn helpers.

Admin & batch: batch_write, batch_read, consolidate, run_forgetting, reconsolidate, admin_consolidate, admin_forget, with_namespace, iter_memories, list_tenants, get_events, component_health. Dashboard admin (require CML_ADMIN_API_KEY): dashboard_overview, dashboard_memories, dashboard_memory_detail, dashboard_facts, dashboard_invalidate_fact, dashboard_export_memories, dashboard_timeline, get_sessions (active sessions from Redis), get_rate_limits (rate-limit usage per API key), get_request_stats (hourly request volume), get_graph_stats, graph_overview, explore_graph, search_graph, dashboard_neo4j_config, get_config/update_config, get_labile_status, test_retrieval, get_jobs, bulk_memory_action, reset_database.

Embedded extras: EmbeddedConfig for storage_mode, embedding/LLM, auto_consolidate, auto_forget. Export/import: export_memories, import_memories (and async export_memories_async, import_memories_async) for migration between embedded and server.

OpenAI integration: CMLOpenAIHelper(memory_client, openai_client) for memory-augmented chat. Set OPENAI_MODEL or LLM_INTERNAL__MODEL in .env.

from openai import OpenAI
from cml import CognitiveMemoryLayer
from cml.integrations import CMLOpenAIHelper

memory = CognitiveMemoryLayer(api_key="...", base_url="...")
helper = CMLOpenAIHelper(memory, OpenAI())
response = helper.chat("What should I eat tonight?", session_id="s1")

Developer: read_safe (returns empty on connection/timeout), memory.session(name=...), configure_logging("DEBUG"), typed models (py.typed). Typed exceptions: AuthenticationError, AuthorizationError, ValidationError, RateLimitError, NotFoundError, ServerError, CMLConnectionError, CMLTimeoutError. The MemoryProvider protocol is available for custom backends. See API Reference.

Temporal fidelity: Optional timestamp in write(), turn(), and remember() enables historical data replay for benchmarks, migration, and testing. See Temporal Fidelity.

Eval mode: eval_mode=True in write() or remember() returns eval_outcome and eval_reason (stored/skipped and write-gate reason) for benchmark scripts. See API Reference — Eval mode.

Documentation

Getting Started
API Reference
Configuration
Examples
Temporal Fidelity
Evaluation Module — cml-eval CLI and Python API
Modeling Module — cml-models CLI and Python API
Security policy

GitHub repository — source, issues, server setup

CHANGELOG

Testing

The SDK has 323 tests (unit, integration, embedded, e2e). From the repository root:

# Run all SDK tests
pytest packages/py-cml/tests -v

# Unit only
pytest packages/py-cml/tests/unit -v

# Integration (requires CML API; set CML_BASE_URL, CML_API_KEY)
pytest packages/py-cml/tests/integration -v

# Embedded (requires embedding/LLM from .env or skip)
pytest packages/py-cml/tests/embedded -v

# E2E (requires CML API)
pytest packages/py-cml/tests/e2e -v

Some integration, embedded, and e2e tests skip when the CML server or embedding model is unavailable. See the root tests/README.md for skipped-test details.

License

GPL-3.0-or-later. See LICENSE.

Optional Modules (Eval and Modeling)

Install optional modules depending on your workflow:

# Evaluation utilities (`cml.eval`, `cml-eval`)
pip install "cognitive-memory-layer[eval]"

# Custom model prep/training (`cml.modeling`, `cml-models`)
pip install "cognitive-memory-layer[modeling]"

# Both modules
pip install "cognitive-memory-layer[eval,modeling]"

Each extra installs only its own dependencies. Running cml-eval or cml-models without the corresponding extra produces a clear error message with install instructions.

Evaluation CLI — run LoCoMo-Plus benchmarks, validate outputs, and generate comparison reports:

cml-eval run-full --repo-root .              # Full pipeline (Docker + ingest + QA + judge)
cml-eval run-locomo --limit-samples 10       # Quick test with 10 samples
cml-eval validate --outputs-dir evaluation/outputs
cml-eval report --summary evaluation/outputs/locomo_plus_qa_cml_judge_summary.json
cml-eval compare --summary evaluation/outputs/locomo_plus_qa_cml_judge_summary.json

Modeling CLI — prepare training data and train custom TF-IDF models:

cml-models prepare --config packages/models/model_pipeline.toml
cml-models train --config packages/models/model_pipeline.toml --strict
cml-models train --config packages/models/model_pipeline.toml --allow-skips
cml-models pipeline --config packages/models/model_pipeline.toml -- --strict

cml-models train is strict-by-default (TrainConfig.strict=True). Deferred token tasks and missing task coverage fail fast unless --allow-skips is set.

Python API — both modules expose typed dataclass configs for programmatic use:

from cml.eval import LocomoEvalConfig, run_locomo_plus
from cml.modeling import PrepareConfig, TrainConfig, run_pipeline

See Evaluation Module and Modeling Module for full CLI flags, Python API reference, and dataclass field documentation.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

avinashm

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.5.1

Jun 20, 2026

1.4.2

Mar 21, 2026

1.4.1

Mar 21, 2026

1.4.0

Mar 21, 2026

1.3.8

Mar 19, 2026

1.3.7

Mar 7, 2026

1.3.6

Feb 24, 2026

1.3.5

Feb 23, 2026

1.3.4

Feb 22, 2026

1.3.2

Feb 21, 2026

1.3.1

Feb 21, 2026

1.3.0

Feb 17, 2026

1.2.1

Feb 15, 2026

1.2.0

Feb 15, 2026

1.1.0

Feb 13, 2026

1.0.11

Feb 12, 2026

1.0.10

Feb 12, 2026

1.0.9

Feb 12, 2026

1.0.8

Feb 11, 2026

0.1.0

Feb 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognitive_memory_layer-1.5.1.tar.gz (44.1 MB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cognitive_memory_layer-1.5.1-py3-none-any.whl (235.4 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file cognitive_memory_layer-1.5.1.tar.gz.

File metadata

Download URL: cognitive_memory_layer-1.5.1.tar.gz
Upload date: Jun 20, 2026
Size: 44.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cognitive_memory_layer-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`caaeef6d9fb8b0457e7417785c4ae10a645105f47744fffe558037337bc529aa`
MD5	`7b0f7d983ca1a942ac497cdd42527602`
BLAKE2b-256	`4d7985be66d1f805faa0d0c55c99f0759b9cc21f7c7fe1cd4d289ae41f9685df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cognitive_memory_layer-1.5.1.tar.gz:

Publisher: py-cml.yml on avinash-mall/CognitiveMemoryLayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cognitive_memory_layer-1.5.1.tar.gz
- Subject digest: caaeef6d9fb8b0457e7417785c4ae10a645105f47744fffe558037337bc529aa
- Sigstore transparency entry: 1881293319
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: avinash-mall/CognitiveMemoryLayer@f6ee46b878f78ad9b0165a454c188baf40af30b7
- Branch / Tag: refs/tags/py-cml-v1.5.1
- Owner: https://github.com/avinash-mall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: py-cml.yml@f6ee46b878f78ad9b0165a454c188baf40af30b7
- Trigger Event: push

File details

Details for the file cognitive_memory_layer-1.5.1-py3-none-any.whl.

File metadata

Download URL: cognitive_memory_layer-1.5.1-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 235.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cognitive_memory_layer-1.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdcad8a75eb6b5236e421201cd712c9f811df2cf4d8db5e36bfeb4ed748fe6cb`
MD5	`b8b04131f397fc16a2363fe9037da91e`
BLAKE2b-256	`e3c37f537bdb404238552d5289d1678d7fe3f4c741fd527606f2d30da8f2e7c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cognitive_memory_layer-1.5.1-py3-none-any.whl:

Publisher: py-cml.yml on avinash-mall/CognitiveMemoryLayer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cognitive_memory_layer-1.5.1-py3-none-any.whl
- Subject digest: cdcad8a75eb6b5236e421201cd712c9f811df2cf4d8db5e36bfeb4ed748fe6cb
- Sigstore transparency entry: 1881293401
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: avinash-mall/CognitiveMemoryLayer@f6ee46b878f78ad9b0165a454c188baf40af30b7
- Branch / Tag: refs/tags/py-cml-v1.5.1
- Owner: https://github.com/avinash-mall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: py-cml.yml@f6ee46b878f78ad9b0165a454c188baf40af30b7
- Trigger Event: push

cognitive-memory-layer 1.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

cognitive-memory-layer

Installation

Quick start

Configuration

Features

Documentation

Testing

License

Optional Modules (Eval and Modeling)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance