Neuro-inspired memory system for LLMs (server + Python SDK)
Project description
cognitive-memory-layer
Python SDK for the Cognitive Memory Layer — neuro-inspired memory for AI applications. Store, retrieve, and reason over memories with sync/async clients or an in-process embedded engine.
The Cognitive Memory Layer (CML) gives LLMs a neuro-inspired memory system: episodic and semantic storage, consolidation, and active forgetting. It fits into agents, RAG pipelines, and personalized apps as a persistent, queryable memory backend. This SDK provides sync and async HTTP clients for a CML server, plus an optional in-process embedded engine (lite mode: SQLite and local embeddings, no server). You get write/read/turn, sessions, admin and batch operations, and a helper for OpenAI chat.
Who it's for: Developers building AI applications that need persistent, queryable memory — chatbots, agents, evaluation pipelines, and personalized assistants.
What you can do:
- Power agent loops with retrieved context and store observations in memory.
- Add memory to RAG pipelines so retrieval is informed by prior interactions.
- Personalize by user or session with namespaces and session-scoped context.
- Run benchmarks with eval mode and temporal fidelity (historical timestamps). For bulk evaluation, the server supports
LLM_INTERNAL__*and the eval script supports--ingestion-workers; see configuration. - Run embedded without a server for development, demos, or single-machine apps.
What's new (1.4.x): session-scoped write route support in SessionScope/AsyncSessionScope, new dashboard/admin helpers (dashboard_facts, dashboard_invalidate_fact, dashboard_export_memories, graph_overview, admin_consolidate, admin_forget), embedded provider parity fixes, and wrapper parity updates for user_timezone/timestamp. See CHANGELOG.
Installation
pip install cognitive-memory-layer
Embedded mode (run the CML engine in-process, no server). In lite mode, only the episodic (vector) store is used; the neocortical (graph/semantic) store is disabled, so there is no knowledge graph or semantic consolidation. Best for development, demos, or single-machine apps.
pip install cognitive-memory-layer[embedded]
From the monorepo, the server and SDK are built from the repository root (single pyproject.toml). Install in editable mode with optional extras:
# From repo root: install SDK only
pip install -e .
# From repo root: install server + SDK
pip install -e ".[server,dev]"
# From repo root: install SDK with embedded mode (in-process engine)
pip install -e ".[embedded]"
Quick start
Sync client — Connect to a CML server, write a memory, read by query, and run a turn with a session; use result.context for LLM injection and result.memories (or result.constraints when the server returns them) for structured access.
from cml import CognitiveMemoryLayer
with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
memory.write("User prefers vegetarian food.")
result = memory.read("What does the user eat?")
print(result.context) # Formatted for LLM injection
for m in result.memories:
print(m.text, m.relevance)
turn = memory.turn(user_message="What should I eat tonight?", session_id="session-001")
print(turn.memory_context)
Async client — Same flow as sync; use async with and await for all operations.
import asyncio
from cml import AsyncCognitiveMemoryLayer
async def main():
async with AsyncCognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
await memory.write("User prefers dark mode.")
result = await memory.read("user preferences")
print(result.context)
asyncio.run(main())
Embedded mode — No server: SQLite plus local embeddings (lite mode). Use db_path for persistence.
import asyncio
from cml import EmbeddedCognitiveMemoryLayer
async def main():
async with EmbeddedCognitiveMemoryLayer() as memory:
await memory.write("User prefers vegetarian food.")
result = await memory.read("dietary preferences")
print(result.context)
asyncio.run(main())
# Persistent: EmbeddedCognitiveMemoryLayer(db_path="./my_memories.db")
Get context for injection — Use get_context(query) when you only need a formatted string for the LLM:
with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
context = memory.get_context("user preferences")
# Inject context into your system prompt or RAG pipeline
Session-scoped flow — Use memory.session(name="...") to scope writes and reads to a session:
with CognitiveMemoryLayer(api_key="sk-...", base_url="http://localhost:8000") as memory:
with memory.session(name="session-001") as session:
session.write("User asked about Italian food.")
session.read("What did I ask earlier?")
session.turn(user_message="Any good places nearby?", assistant_response="...")
SessionScope.write()/AsyncSessionScope.write() call /session/{session_id}/write, and SessionScope.read()/AsyncSessionScope.read() call /session/{session_id}/read, so session wrappers stay path-scoped on both write and read.
More usage: Timezone-aware retrieval with read(..., user_timezone="America/New_York") or turn(..., user_timezone="America/New_York"). Batch operations: batch_write([{"content": "..."}, ...]) and batch_read(["query1", "query2"]) for multiple writes or reads.
Configuration
Client: Environment variables (use .env or set directly): CML_API_KEY, CML_BASE_URL, CML_TENANT_ID, CML_TIMEOUT, CML_MAX_RETRIES, CML_ADMIN_API_KEY, CML_VERIFY_SSL. Use CMLConfig for validated, reusable config. See Configuration.
Constructor:
memory = CognitiveMemoryLayer(
api_key="sk-...",
base_url="http://localhost:8000",
tenant_id="my-tenant",
)
Or pass a config object: from cml import CMLConfig then CognitiveMemoryLayer(config=config).
Embedded: Use EmbeddedConfig (or constructor args). Options: storage_mode (lite | standard | full; only lite is implemented), tenant_id, database, embedding, llm, auto_consolidate, auto_forget. Embedding and LLM are read from .env when not set: EMBEDDING_INTERNAL__PROVIDER, EMBEDDING_INTERNAL__MODEL, EMBEDDING_INTERNAL__DIMENSIONS, EMBEDDING_INTERNAL__BASE_URL, LLM_INTERNAL__MODEL, LLM_INTERNAL__BASE_URL. Lite mode uses SQLite and local embeddings; pass db_path for a persistent database. Full details in Configuration.
Features
| Mode | Description |
|---|---|
| Client | Sync and async HTTP clients for a running CML server; context managers |
| Embedded | In-process engine (lite mode: SQLite + local embeddings); no server. Embedded read() passes memory_types, since, and until to the orchestrator. |
Memory API: write, read, read_stream, read_safe, turn, update, forget, stats, get_context, create_session, get_session_context, delete_all, remember (alias for write), search (alias for read), health. Options: user_timezone on read(), get_context(), search(), and turn() for timezone-aware "today"/"yesterday"; timestamp on write(), turn(), and remember() for event time; eval_mode on write()/remember() for benchmark responses. Write supports context_tags, session_id, memory_type, namespace, metadata, agent_id. Read supports memory_types, since, until, response_format (packet | list | llm_context).
Response shape: ReadResponse has memories, facts, preferences, episodes, constraints (when the server has constraint extraction), and context (formatted string for LLM injection).
Server compatibility: The server supports delete_all (admin API key), read filters and user_timezone, response formats, write metadata and memory_type, and session-scoped context. Read filters and user_timezone are sent when the server supports them. The server can use LLM-based extraction (constraints, facts, salience, importance) when FEATURES__USE_LLM_* flags are enabled; see UsageDocumentation § Configuration Reference.
Session and namespace: memory.session(name=...) (SessionScope) scopes writes/reads/turns to a session via session-scoped routes. with_namespace(namespace) returns a NamespacedClient (and async AsyncNamespacedClient) that injects namespace into write, update, and batch_write, and forwards user_timezone/timestamp on read/turn helpers.
Admin & batch: batch_write, batch_read, consolidate, run_forgetting, reconsolidate, admin_consolidate, admin_forget, with_namespace, iter_memories, list_tenants, get_events, component_health. Dashboard admin (require CML_ADMIN_API_KEY): dashboard_overview, dashboard_memories, dashboard_memory_detail, dashboard_facts, dashboard_invalidate_fact, dashboard_export_memories, dashboard_timeline, get_sessions (active sessions from Redis), get_rate_limits (rate-limit usage per API key), get_request_stats (hourly request volume), get_graph_stats, graph_overview, explore_graph, search_graph, dashboard_neo4j_config, get_config/update_config, get_labile_status, test_retrieval, get_jobs, bulk_memory_action, reset_database.
Embedded extras: EmbeddedConfig for storage_mode, embedding/LLM, auto_consolidate, auto_forget. Export/import: export_memories, import_memories (and async export_memories_async, import_memories_async) for migration between embedded and server.
OpenAI integration: CMLOpenAIHelper(memory_client, openai_client) for memory-augmented chat. Set OPENAI_MODEL or LLM_INTERNAL__MODEL in .env.
from openai import OpenAI
from cml import CognitiveMemoryLayer
from cml.integrations import CMLOpenAIHelper
memory = CognitiveMemoryLayer(api_key="...", base_url="...")
helper = CMLOpenAIHelper(memory, OpenAI())
response = helper.chat("What should I eat tonight?", session_id="s1")
Developer: read_safe (returns empty on connection/timeout), memory.session(name=...), configure_logging("DEBUG"), typed models (py.typed). Typed exceptions: AuthenticationError, AuthorizationError, ValidationError, RateLimitError, NotFoundError, ServerError, CMLConnectionError, CMLTimeoutError. The MemoryProvider protocol is available for custom backends. See API Reference.
Temporal fidelity: Optional timestamp in write(), turn(), and remember() enables historical data replay for benchmarks, migration, and testing. See Temporal Fidelity.
Eval mode: eval_mode=True in write() or remember() returns eval_outcome and eval_reason (stored/skipped and write-gate reason) for benchmark scripts. See API Reference — Eval mode.
Documentation
- Getting Started
- API Reference
- Configuration
- Examples
- Temporal Fidelity
- Evaluation Module —
cml-evalCLI and Python API - Modeling Module —
cml-modelsCLI and Python API - Security policy
GitHub repository — source, issues, server setup
Testing
The SDK has 323 tests (unit, integration, embedded, e2e). From the repository root:
# Run all SDK tests
pytest packages/py-cml/tests -v
# Unit only
pytest packages/py-cml/tests/unit -v
# Integration (requires CML API; set CML_BASE_URL, CML_API_KEY)
pytest packages/py-cml/tests/integration -v
# Embedded (requires embedding/LLM from .env or skip)
pytest packages/py-cml/tests/embedded -v
# E2E (requires CML API)
pytest packages/py-cml/tests/e2e -v
Some integration, embedded, and e2e tests skip when the CML server or embedding model is unavailable. See the root tests/README.md for skipped-test details.
License
GPL-3.0-or-later. See LICENSE.
Optional Modules (Eval and Modeling)
Install optional modules depending on your workflow:
# Evaluation utilities (`cml.eval`, `cml-eval`)
pip install "cognitive-memory-layer[eval]"
# Custom model prep/training (`cml.modeling`, `cml-models`)
pip install "cognitive-memory-layer[modeling]"
# Both modules
pip install "cognitive-memory-layer[eval,modeling]"
Each extra installs only its own dependencies. Running cml-eval or cml-models without the corresponding extra produces a clear error message with install instructions.
Evaluation CLI — run LoCoMo-Plus benchmarks, validate outputs, and generate comparison reports:
cml-eval run-full --repo-root . # Full pipeline (Docker + ingest + QA + judge)
cml-eval run-locomo --limit-samples 10 # Quick test with 10 samples
cml-eval validate --outputs-dir evaluation/outputs
cml-eval report --summary evaluation/outputs/locomo_plus_qa_cml_judge_summary.json
cml-eval compare --summary evaluation/outputs/locomo_plus_qa_cml_judge_summary.json
Modeling CLI — prepare training data and train custom TF-IDF models:
cml-models prepare --config packages/models/model_pipeline.toml
cml-models train --config packages/models/model_pipeline.toml --strict
cml-models train --config packages/models/model_pipeline.toml --allow-skips
cml-models pipeline --config packages/models/model_pipeline.toml -- --strict
cml-models train is strict-by-default (TrainConfig.strict=True). Deferred token tasks and missing task coverage fail fast unless --allow-skips is set.
Python API — both modules expose typed dataclass configs for programmatic use:
from cml.eval import LocomoEvalConfig, run_locomo_plus
from cml.modeling import PrepareConfig, TrainConfig, run_pipeline
See Evaluation Module and Modeling Module for full CLI flags, Python API reference, and dataclass field documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cognitive_memory_layer-1.4.2.tar.gz.
File metadata
- Download URL: cognitive_memory_layer-1.4.2.tar.gz
- Upload date:
- Size: 15.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce47954d9e721318676424e34328c8f43d15c60368b4e9cafa9bfd0466965949
|
|
| MD5 |
34a24aabcba499b2cb515e48512717a1
|
|
| BLAKE2b-256 |
2102d5a666bb622c8e5cdda2442f353bad8010ae33783d5f8f0aa2d8883ddde2
|
Provenance
The following attestation bundles were made for cognitive_memory_layer-1.4.2.tar.gz:
Publisher:
py-cml.yml on avinash-mall/CognitiveMemoryLayer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cognitive_memory_layer-1.4.2.tar.gz -
Subject digest:
ce47954d9e721318676424e34328c8f43d15c60368b4e9cafa9bfd0466965949 - Sigstore transparency entry: 1154432775
- Sigstore integration time:
-
Permalink:
avinash-mall/CognitiveMemoryLayer@090060c4478c373a02eb09ae63292f51ca31a710 -
Branch / Tag:
refs/tags/py-cml-v1.4.2 - Owner: https://github.com/avinash-mall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
py-cml.yml@090060c4478c373a02eb09ae63292f51ca31a710 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cognitive_memory_layer-1.4.2-py3-none-any.whl.
File metadata
- Download URL: cognitive_memory_layer-1.4.2-py3-none-any.whl
- Upload date:
- Size: 218.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efa6e24318dfa0f211f3d04e3c8644a0661e14aefc855cca1a11b75d533c0bda
|
|
| MD5 |
43efa4c2df88f3869ff2c5ee608cf10b
|
|
| BLAKE2b-256 |
d1ab0b92009641971cf083408da7d3055e73288252dae4443342da72c4a4512a
|
Provenance
The following attestation bundles were made for cognitive_memory_layer-1.4.2-py3-none-any.whl:
Publisher:
py-cml.yml on avinash-mall/CognitiveMemoryLayer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cognitive_memory_layer-1.4.2-py3-none-any.whl -
Subject digest:
efa6e24318dfa0f211f3d04e3c8644a0661e14aefc855cca1a11b75d533c0bda - Sigstore transparency entry: 1154432777
- Sigstore integration time:
-
Permalink:
avinash-mall/CognitiveMemoryLayer@090060c4478c373a02eb09ae63292f51ca31a710 -
Branch / Tag:
refs/tags/py-cml-v1.4.2 - Owner: https://github.com/avinash-mall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
py-cml.yml@090060c4478c373a02eb09ae63292f51ca31a710 -
Trigger Event:
push
-
Statement type: