Skip to main content

Smriti AI: inference-time semantic memory for small language models

Project description

Smriti AI

CI Security Agent Hardening Production File Audit Release Docker Images PyPI Python Versions License: Apache-2.0 Status Frozen Base Model Training Free Privacy Auth/RBAC Monitoring Benchmark Provenance Docker Hugging Face Local First

Smriti AI is a local-first, training-free memory layer for small language models. It wraps a frozen HuggingFace causal language model with persistent memory, semantic retrieval, graph memory, identity governance, API tooling, Docker deployment, benchmarks, and production-readiness checks.

The name comes from smriti (IAST: smṛti), a Sanskrit term associated with memory and remembrance. Wisdom Library describes Smritis as "that which has to be remembered": Wisdom Library: Smriti.

Small models are not only limited by parameter count. They are limited by the absence of durable memory.

Smriti AI keeps the base model frozen. It improves long-term recall by storing external memory, retrieving only relevant facts, injecting them into the prompt, and updating the memory after each interaction. No LoRA, fine-tuning, adapter weights, or model retraining are required for the inference-time memory system.

Current Status

Smriti AI is packaged on PyPI as smriti-memory-ai with the importable package smriti. The GitHub repository and Hugging Face resources remain named smriti-ai.

Area Status
Python package pyproject.toml, src/ layout, console scripts, build workflow.
Core memory TF-IDF compatibility mode plus semantic session/topic/fact memory.
Retrieval Sentence-transformer embeddings, FAISS/NumPy fallback, cosine similarity with temporal decay.
Graph memory Per-session networkx knowledge graph with simple triple extraction and traversal.
Identity governance Embedding-based persona fingerprint with drift checks and refinement hooks.
Backends JSON, SQLite, Redis, and Postgres backend abstractions.
Privacy Optional encrypted memory blobs and /memory/delete.
Audit controls List, search, pin, archive, update, and delete individual memory entries.
Auth/RBAC Optional API-key auth with user and admin roles.
API FastAPI service with CORS, OpenAPI docs, metrics, health checks, API-key/RBAC option.
CLI smriti, smriti-cli, smriti-api, migration commands, and backward-compatible mempalace.
Docker CPU, GPU, demo, training Dockerfiles and compose profiles.
Monitoring Prometheus endpoint and Grafana dashboard assets.
Benchmarks Gemma 4 public benchmark policy, cross-model harness, LoCoMo-style runner, identity bench.
Provider adapters Local HF, HF Endpoint, Ollama, vLLM, and OpenAI-compatible generation adapters.
Memory standard Portable memory protocol plus backend conformance runner and schema migrations.
Research evidence Curated historical/current benchmark lineage without shipping noisy raw logs.
Tests/CI Unit/integration tests, package build/install checks, audit report, GitHub Actions.

What Smriti AI Is

Smriti AI is an inference-time memory runtime. It sits between the user and the model.

For each turn, it can:

  1. Read the current user message.
  2. Retrieve relevant memories scoped by session_id and topic_id.
  3. Query related graph triples.
  4. Build an augmented prompt.
  5. Generate with a frozen base model.
  6. Check persona/identity drift.
  7. Run a refinement pass if needed.
  8. Extract new facts and triples.
  9. Persist updated memory.

It is designed for:

User Type Why They Use It
Enterprise AI product teams Add auditable memory to existing AI services without retraining models.
Personal assistant developers Give local assistants user-specific recall across sessions.
Research groups Evaluate memory augmentation, retrieval modes, and continual-learning boundaries.
Privacy-sensitive organizations Keep user memory local or self-hosted with encryption and deletion hooks.
Multi-agent builders Let planner, summarizer, and executor agents share one user's isolated memory.

What Smriti AI Is Not

Smriti AI is not a hosted model provider, a replacement foundation model, or a fine-tuning method for inference-time recall. It does not magically improve every task. It improves tasks where persistent user facts, context continuity, retrieval, or persona stability matter.

The separate src/training/ package exists for replay/EWC research experiments. That training code is intentionally separate from the inference-time Smriti AI memory runtime.

Key Features

Feature Description
Semantic retrieval Uses embeddings to retrieve meaning-similar facts even when wording changes.
Hierarchical memory Stores memory as sessions -> topics -> facts, giving multi-user and multi-task isolation.
Temporal decay Scores memories by cosine similarity multiplied by exp(-lambda * age).
TF-IDF mode Keeps a lightweight lexical retrieval mode for compatibility and low-dependency environments.
Knowledge graph Extracts simple subject-relation-object triples and injects related facts.
Identity fingerprint Averages persona/self-description embeddings and detects drift in model outputs.
Memory compression Summarizes older topic entries and archives originals before eviction.
Durable backends JSON, SQLite, Redis, and Postgres backends through a common interface.
Encryption hooks Optional symmetric encryption for memory blobs with SMRITI_MEMORY_KEY.
Deletion support /memory/delete and CLI delete commands for user memory removal.
Audit dashboard Authenticated memory table for search, edit, pin, archive, and per-entry deletion.
Provider adapters Swap local Hugging Face for HF Endpoints, Ollama, vLLM, or OpenAI-compatible APIs.
FastAPI service /chat, /memory/load, /memory/save, /memory/delete, /graph/query, /metrics, /health.
CLI Local commands for config, chat, save/load/delete, graph query, server start, and benchmarks.
Docker Compose stacks for API, Redis/Postgres, Prometheus/Grafana, demo, CPU/GPU images.
Benchmarks Gemma 4 memory-retention, retrieval-mode comparison, latency, identity, LoCoMo-style long-memory, historical-protocol rerun.
CI/CD GitHub Actions for tests, style checks, package build/install, Docker, release workflows.

Research Lineage And Principles

Smriti AI was built from scratch around a few durable ideas from memory-augmented small-model systems.

Principle Smriti AI Interpretation Current Implementation
External memory Memory should live outside model weights in a portable, inspectable store. MemPalaceLite, SemanticMemory, durable backends, JSON export/import.
Training-free recall User recall should improve at inference time without changing model weights. Retrieved memory is injected into prompts on each call.
Identity continuity Assistants should maintain persona and user-specific context across turns. IdentityFingerprint detects embedding drift and can trigger refinement.
Small-model augmentation Small models become more useful when paired with explicit state. Works with Gemma 4 and other HuggingFace causal LMs.
Local-first privacy Memory should be deployable on a user's own machine or infrastructure. JSON/SQLite local stores, optional encryption, deletion endpoint.
MLOps reproducibility Memory systems should be benchmarked, tested, packaged, monitored, and deployable. CI, Docker, benchmark CSVs, reports, model card, monitoring stack.

Historical numbers from earlier writeups are treated as research lineage. Current claims should use the current Smriti AI benchmark artifacts in this repository.

For deeper lineage and reproducibility:

Document Purpose
research/evidence/README.md Curated historical/current evidence policy.
research/evidence/benchmark_lineage.csv Historical and current result ledger.
research/README.md Curated original notebook/log/excerpt manifest.
docs/memory_format.md Portable Smriti memory format and backend contract.
docs/memory_spec.md Stable memory protocol for JSON, SQLite, Redis, and Postgres entries.
docs/kaggle_colab.md Kaggle/Colab reproducibility guide using package imports.
smriti-ai-kaggle.ipynb Current Kaggle kernel notebook, mirrored from demos/smriti_kaggle.ipynb.
demos/smriti_kaggle.ipynb / demos/smriti_colab.ipynb Reproducible package-import notebooks pinned to smriti-memory-ai==1.0.7.

Architecture

flowchart TD
    U["User / Agent Message"] --> A["SmritiAILite.chat"]
    A --> Q["Session + Topic Scope"]
    Q --> R["Memory Retrieval"]
    R --> S["SemanticMemory or TF-IDF"]
    R --> G["KnowledgeGraphMemory"]
    S --> C["Context Builder"]
    G --> C
    C --> P["Augmented Prompt"]
    P --> M["Frozen HuggingFace Causal LM"]
    M --> I["IdentityFingerprint"]
    I -->|"aligned"| O["Final Response"]
    I -->|"drift"| F["Refinement Pass"]
    F --> O
    O --> E["Fact + Triple Extraction"]
    E --> B["Durable Backend"]
    B --> J["JSON / SQLite / Redis / Postgres"]

Core Modules

Module File Responsibility
SmritiAILite src/smriti/agent.py Main model wrapper for HuggingFace generation plus memory updates.
BaselineGemma src/smriti/agent.py Plain model baseline with no memory layer.
MemPalaceLite src/smriti/core.py High-level memory facade and backward-compatible API.
SemanticMemory src/smriti/semantic_memory.py Hierarchical embedding memory with FAISS/NumPy retrieval, compression, JSON persistence.
KnowledgeGraphMemory src/smriti/knowledge_graph.py Triple extraction, graph storage, query traversal, natural-language rendering.
IdentityFingerprint src/smriti/identity_fingerprint.py Persona vectors, drift scoring, adaptive thresholds, refinement prompts.
MACPLite src/smriti/macp.py Compact reasoning continuity state.
Backends src/smriti/backends.py JSON, SQLite, Redis, Postgres, encryption, deletion.
Backend conformance src/smriti/backend_conformance.py Reusable compatibility checks for backend authors.
Backend migrations src/smriti/migrations.py, src/smriti/sql/ Versioned SQLite/Postgres schema files.
Audit src/smriti/audit.py, src/smriti/audit_api.py Memory inspection, pin/archive/update/delete control plane.
Auth src/smriti/auth.py API-key authentication and user/admin RBAC checks.
Provider adapters src/smriti/adapters/ Local HF, HF Endpoint, Ollama, vLLM, OpenAI-compatible generation.
Config src/smriti/config.py config.yaml and environment variable loading.
API src/smriti/api.py FastAPI app, observability, optional API-key auth.
CLI src/smriti/cli.py Local commands for config, memory, API, graph, and benchmark workflows.
Integrations src/smriti/integrations/ LangChain and LlamaIndex adapters.
Training research src/training/ewc_replay.py Optional replay/EWC experiments, separate from runtime memory.

Installation

Requirements

Requirement Notes
Python 3.10 or newer.
OS Tested locally on macOS; CI validates Linux. Windows helpers are included.
Model runtime Optional unless using SmritiAILite with a real HuggingFace model.
Gemma 4 access Public benchmark path uses google/gemma-4-E2B-it; users may need Hugging Face access/login.

Install From GitHub

git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[ml,bench]"

For development:

pip install -e ".[dev,ml,bench]"

For all optional integrations:

pip install -e ".[full]"

Install From PyPI

The public PyPI distribution is smriti-memory-ai because the shorter smriti-ai project name is already occupied on PyPI by another owner. The Python import stays clean and stable:

from smriti import SmritiAILite

Recommended install:

pip install "smriti-memory-ai[ml]==1.0.7"

GitHub tag fallback:

pip install "smriti-memory-ai[ml] @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.7"

Then verify:

python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('Smriti import OK')"

One-Shot Installer

Linux/macOS:

./install_smriti.sh

Windows PowerShell:

./install_smriti.ps1

Windows batch:

install_smriti.bat

The installer creates a local virtual environment, installs package extras, writes config.yaml, and can cache Gemma 4.

Model And API Keys

Smriti AI itself does not require a model-provider API key. It is a memory layer.

Key Required? Purpose
SMRITI_API_KEY Optional Protects Smriti API routes. Clients send x-api-key.
SMRITI_MEMORY_KEY Optional but recommended Encrypts memory blobs before writing to disk or backend.
HF_TOKEN Sometimes Needed only if Hugging Face model access requires authentication.
Provider keys Depends Needed only if users connect Smriti to a hosted provider instead of a local model.

For Gemma 4 via Hugging Face:

hf auth login
# or
export HF_TOKEN="your-huggingface-token"

For a protected Smriti API:

export SMRITI_API_KEY="replace-with-service-secret"
curl -H "x-api-key: $SMRITI_API_KEY" http://localhost:8000/health

For encrypted memory:

export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"

Do not commit real keys to Git.

Quick Start For Customers

Enterprise AI Teams

Deploy Smriti AI as a memory service behind your API gateway with Redis/Postgres and monitoring enabled:

cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
COMPOSE_PROFILES=redis,monitoring SMRITI_MEMORY_BACKEND=redis docker compose up -d --build

Integrate by sending POST /chat requests with a stable user_id, then scrape /metrics with Prometheus and review dashboards in Grafana. Use admin keys only for support/audit workflows.

Indie Developers And Personal Assistants

Install locally and start with JSON or SQLite memory:

pip install "smriti-memory-ai @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.7"
smriti-cli config wizard --backend json --overwrite
smriti-cli --session-id alex --topic-id profile chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile chat "What do you remember about me?"

Use LangChain/LlamaIndex integrations when you want Smriti memory inside an existing agent framework.

Researchers And Startups

Clone the repo, run benchmarks, and compare retrieval modes:

git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
pip install -e ".[dev,ml,bench]"
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic

Use smriti-ai-kaggle.ipynb, demos/smriti_kaggle.ipynb, and demos/smriti_colab.ipynb for package-based notebook demos. They demonstrate the frozen Gemma 4 target, semantic/graph/identity memory, SQLite persistence, and delete verification without using fake/mock/tiny production claims.

Privacy-Sensitive Deployments

Keep memory local, encrypted, auditable, and deletable:

export SMRITI_MEMORY_BACKEND=sqlite
export SMRITI_SQLITE_PATH=data/smriti_memory.sqlite3
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
export AUTH_ENABLED=true
smriti-cli start-server --host 127.0.0.1 --port 8000

Expose /memory/delete in your user data-deletion flow and run the authenticated audit UI only for trusted operators.

Quick Start: Python Library

Memory-Only Usage

This path does not require PyTorch, transformers, or a model download.

from smriti import MemPalaceLite

memory = MemPalaceLite(retrieval_mode="semantic", session_id="alex", topic_id="profile")

memory.add_fact("Alex is a marine biologist in Hawaii.")
context = memory.get_context("What do you remember about Alex?")

print(context)

Full Gemma 4 Usage

This uses a real model and the Smriti AI wrapper.

from transformers import AutoModelForCausalLM, AutoTokenizer
from smriti import SmritiAILite

model_id = "google/gemma-4-E2B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

agent = SmritiAILite(
    model=model,
    tokenizer=tokenizer,
    retrieval_mode="semantic",
    session_id="alex",
    topic_id="profile",
)

agent.chat("My name is Alex and I am a marine biologist.")
reply = agent.chat("What do you remember about me?")
print(reply)

Save And Load Memory

agent.save_memory("smriti_memory.json")
agent.load_memory("smriti_memory.json")

Direct Semantic Memory

from smriti import SemanticMemory

memory = SemanticMemory()
memory.add_entry("user-a", "profile", "Maya is a doctor at a community clinic.")
results = memory.retrieve("user-a", "profile", "physician medical work", k=1)
print(results[0].entry.text)

Direct Knowledge Graph

from smriti import KnowledgeGraphMemory

graph = KnowledgeGraphMemory()
graph.add_triple("science", "Marie Curie", "discovered", "radium", topic_id="chemistry")
graph.add_triple("science", "radium", "is a", "chemical element", topic_id="chemistry")

facts = graph.triples_to_text(graph.query_graph("science", "Marie Curie", depth=2, topic_id="chemistry"))
print(facts)

Quick Start: CLI

Smriti AI installs these commands:

Command Purpose
smriti-cli Main CLI.
smriti Short alias.
smriti-api Run the FastAPI service.
mempalace Backward-compatible alias.

Create config:

smriti-cli init config.yaml

Interactive backend wizard:

smriti-cli config wizard --backend json --overwrite

Store and retrieve local memory:

smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "Where do I work?"

Save, load, delete:

smriti-cli --session-id alex memory save smriti_memory.json
smriti-cli --session-id alex memory load smriti_memory.json
smriti-cli --session-id alex memory delete --path smriti_memory.json

Run backend compatibility checks:

smriti-cli --backend json --backend-path data/memory backend conformance
smriti-cli --backend sqlite --backend-path data/smriti_memory.sqlite3 backend conformance

Migrate one user's memory between backends:

smriti-cli --session-id alex migrate-backend \
  --from-backend json --from-path data/memory \
  --to-backend sqlite --to-path data/smriti_memory.sqlite3

Query graph memory:

smriti-cli --session-id alex --topic-id profile graph_query user --depth 1

Run benchmarks:

# Installed package smoke check; internal-only, not public benchmark evidence.
SMRITI_ALLOW_TEST_DOUBLES=1 smriti-cli benchmark --quick

# Full Gemma 4 benchmark; run from a cloned source checkout.
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16

Quick Start: FastAPI Service

Start locally:

smriti-cli start-server --host 0.0.0.0 --port 8000
# or
python -m smriti.api --host 0.0.0.0 --port 8000

Health check:

curl http://localhost:8000/health

OpenAPI docs:

http://localhost:8000/docs

Memory/chat request:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic"
  }'

Recall request:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic"
  }'

Important API Runtime Note

The default FastAPI service can operate as a memory service. If no model agent factory is configured, /chat returns memory-aware context and updates memory. To generate full model-backed assistant responses through the API, deploy the API process with a model runtime that registers an agent factory using set_agent_factory, or wrap Smriti inside your own service with SmritiAILite.

This separation keeps the memory service lightweight for enterprise integration while still supporting full local model-backed usage in Python.

API Endpoints

Method Endpoint Purpose
GET /health Liveness and loaded-user count.
GET /metrics Prometheus metrics.
POST /chat Retrieve context, update memory, optionally call configured model agent.
POST /memory/save Save one user's memory.
POST /memory/load Load memory from request body, backend, or path.
POST /memory/delete Delete one user's memory from RAM/backend/path.
POST /memory/list Audit/list one user's memory entries.
POST /memory/update Edit an individual memory entry.
POST /memory/pin Pin/unpin an important memory entry.
POST /memory/archive Archive/unarchive an entry.
POST /memory/entry/delete Delete one memory entry.
POST /graph/query Query session-scoped graph facts.
GET /docs FastAPI Swagger UI.

Configuration

Smriti AI reads config.yaml by default or the file pointed to by SMRITI_CONFIG_PATH.

memory:
  backend: json
  memory_dir: data/memory
  sqlite_path: data/smriti_memory.sqlite3
  redis_url: redis://localhost:6379/0
  postgres_dsn: ""
  autosave: true

security:
  encryption_key: ""

api:
  host: 0.0.0.0
  port: 8000
  cors_origins:
    - "*"

model:
  adapter: local_hf
  base_model_id: google/gemma-4-E2B-it
  hf_endpoint_url: ""
  ollama_url: http://localhost:11434
  vllm_url: http://localhost:8000
  openai_compatible_url: https://api.openai.com/v1

Environment variables override config values:

Variable Purpose
SMRITI_CONFIG_PATH Path to config file.
SMRITI_MEMORY_BACKEND json, sqlite, redis, or postgres.
SMRITI_MEMORY_DIR JSON memory directory.
SMRITI_SQLITE_PATH SQLite database path.
SMRITI_REDIS_URL Redis connection URL.
SMRITI_POSTGRES_DSN Postgres DSN.
SMRITI_AUTOSAVE Save memory after API updates.
SMRITI_MEMORY_KEY Encrypt memory blobs.
SMRITI_API_KEY Protect API routes.
AUTH_ENABLED Enable role-bound API-key auth.
SMRITI_API_KEYS_PATH Path to api_keys.json with user/admin keys.
SMRITI_CORS_ORIGINS Comma-separated CORS allowlist.
SMRITI_HOST API host.
SMRITI_PORT API port.
SMRITI_MODEL_ADAPTER local_hf, hf_endpoint, ollama, vllm, or openai.
BASE_MODEL_ID Base model ID, default google/gemma-4-E2B-it.
HF_ENDPOINT_URL Hugging Face Inference Endpoint URL when using endpoint mode.
OPENAI_API_KEY Provider key only when using OpenAI-compatible adapter.

Durable Memory Backends

Backend Best For Notes
JSON Local experiments, privacy-first single-machine usage. Simple files under data/memory.
SQLite Local production, desktop apps, edge devices. Single-file database, no external service.
Redis Low-latency state for deployed agents. Good for concurrent API use; use persistence in production.
Postgres Enterprise durability and operational tooling. Good fit for audited multi-user deployments.

Examples:

SMRITI_MEMORY_BACKEND=json smriti-cli start-server
SMRITI_MEMORY_BACKEND=sqlite SMRITI_SQLITE_PATH=data/smriti.sqlite3 smriti-cli start-server
SMRITI_MEMORY_BACKEND=redis SMRITI_REDIS_URL=redis://localhost:6379/0 smriti-cli start-server
SMRITI_MEMORY_BACKEND=postgres SMRITI_POSTGRES_DSN=postgresql://host:5432/smriti smriti-cli start-server

Docker

Local CPU API

docker compose up -d --build api

API With Redis

COMPOSE_PROFILES=redis SMRITI_MEMORY_BACKEND=redis docker compose up -d --build

API With Postgres

COMPOSE_PROFILES=postgres SMRITI_MEMORY_BACKEND=postgres docker compose up -d --build

GPU-Capable Image

When Docker has NVIDIA runtime support:

SMRITI_DOCKERFILE=Dockerfile docker compose up -d --build api

Production Compose

docker compose -f docker-compose.prod.yml up -d

Monitoring Stack

COMPOSE_PROFILES=monitoring docker compose up -d --build

Then open:

Service URL
API http://localhost:8000
API docs http://localhost:8000/docs
Prometheus http://localhost:9090
Grafana http://localhost:3000

Default Grafana credentials in local compose:

admin / smriti

Hugging Face Model-Style Deployment

Smriti AI can also be packaged as a Hugging Face model repository with a custom handler.py. This does not make Smriti AI a newly trained foundation model. It packages the memory wrapper, model card, endpoint config, example requests, and upload tooling so Hugging Face Inference Endpoints can serve a memory-augmented base model.

Deployment assets live in:

deploy/huggingface_model/
deploy/huggingface_dataset/
deploy/huggingface_space/

Local handler smoke test:

BASE_MODEL_ID=google/gemma-4-E2B-it \
HF_TOKEN=$HF_TOKEN \
SMRITI_MEMORY_BACKEND=json \
SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
python deploy/huggingface_model/test_handler_local.py

Upload to a Hugging Face model repo:

export HF_TOKEN=...
python deploy/huggingface_model/upload_model_repo.py \
  --repo-id luciferai-devil/smriti-ai \
  --private false

Official v1.0 Hugging Face targets:

Asset Repo
Model wrapper luciferai-devil/smriti-ai
Benchmark dataset luciferai-devil/smriti-ai-benchmarks
CPU-safe demo Space luciferai-devil/smriti-ai-demo

Upload sanitized benchmark artifacts:

python deploy/huggingface_dataset/upload_benchmark_dataset.py \
  --repo-id luciferai-devil/smriti-ai-benchmarks \
  --private false

Upload the public demo Space:

python deploy/huggingface_space/upload_space.py \
  --repo-id luciferai-devil/smriti-ai-demo \
  --private false

The Space runs in CPU-safe memory-only mode by default, warns users not to enter PII, and auto-deletes demo memory after inactivity.

Use BASE_MODEL_ID for a locally loaded model inside the endpoint, or HF_ENDPOINT_URL if the Smriti handler should call another model endpoint. Production endpoints should use external Redis/Postgres memory and must not store private user memory inside the model repository.

See docs/deploy_as_hf_model.md for the full deployment guide.

Monitoring And Observability

The API exports Prometheus metrics from /metrics.

Metric Meaning
smriti_http_requests_total Request count by method, path, status.
smriti_http_errors_total Server-side error count.
smriti_http_request_latency_seconds End-to-end request latency histogram.
smriti_retrieval_latency_seconds Memory retrieval latency histogram.
smriti_tokens_total Approximate token count observed by API.
smriti_user_memories Number of loaded user memory stores.
smriti_user_memory_bytes Approximate serialized memory size by user.

Observability helper:

python scripts/metrics_monitor.py --url http://localhost:8000 --output reports/metrics_report.md

Privacy And Security

Smriti AI stores user memory, so privacy is a core operational concern.

Requirement Smriti AI Support
User isolation Memory is keyed by user_id / session_id and topic_id.
Deletion /memory/delete and smriti-cli memory delete.
Encryption Set SMRITI_MEMORY_KEY to encrypt backend blobs.
API protection Set SMRITI_API_KEY or AUTH_ENABLED=true with api_keys.json.
RBAC user keys can access only their bound user_id; admin keys can operate across users.
Local-first deployment JSON/SQLite can run fully on-device.
Auditability Memory can be exported, inspected, pinned, archived, edited, and deleted.

Delete user memory through API:

curl -X POST http://localhost:8000/memory/delete \
  -H "Content-Type: application/json" \
  -d '{"user_id": "alex"}'

Read more in docs/privacy.md.

Role-bound API keys:

cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
curl -H "x-api-key: replace-with-user-key" http://localhost:8000/health

Read more in docs/auth.md.

Framework Integrations

LangChain

from smriti.integrations.langchain import SmritiMemory

memory = SmritiMemory(session_id="alex", topic_id="profile")
memory.save_context(
    {"input": "My name is Alex and I work at Ocean Lab."},
    {"output": "Nice to meet you, Alex."},
)
print(memory.load_memory_variables({"input": "Where do I work?"}))

LlamaIndex

from smriti.integrations.llama_index import SmritiStorageContext

storage = SmritiStorageContext(session_id="alex", topic_id="profile")
storage.add_node("Alex is a marine biologist.")
print(storage.query("What does Alex do?"))

Web Demo

The demo app lets users inject facts, ask distractors, view retrieved memories, and delete user memory.

pip install -e ".[demo]"
uvicorn demo.app:app --port 8080

You can also run the packaged module directly:

python -m demo.app

Screenshots from the current dashboard:

Smriti AI dashboard home

Smriti AI benchmark evidence

See src/demo/README.md for details.

Memory Audit Dashboard

The audit dashboard is a separate authenticated UI for operators and privacy reviews:

export SMRITI_AUDIT_USER=admin
export SMRITI_AUDIT_PASSWORD="replace-with-a-strong-password"
uvicorn demo.audit_app:app --port 8090

Open http://127.0.0.1:8090 and sign in with the configured credentials. Use it to search, edit, pin, archive, and delete individual memories.

Model Provider Adapters

Smriti AI can wrap multiple generation providers through a small adapter interface:

from smriti.adapters import build_adapter

adapter = build_adapter("hf_endpoint")
text = adapter.generate("Augmented Smriti prompt", max_new_tokens=128)

Supported adapters:

Adapter Typical target
local_hf Local Transformers Gemma 4.
hf_endpoint Hugging Face Inference Endpoint.
ollama Local Ollama REST server.
vllm vLLM server.
openai OpenAI-compatible hosted APIs.

Read more in docs/adapters.md.

Benchmarks

Benchmark Policy

Public benchmark claims in this repository use real Gemma 4 only:

google/gemma-4-E2B-it

Deterministic test-double paths may exist for engineering tests, but they are not public model-quality claims.

Current Local Gemma 4 Results

These are current local CPU measurements from the checked-in CSV artifacts. Benchmark-readiness audit status: benchmark_staging_only. These results are valid validation evidence for the memory harness, but they are not yet industry-grade public production benchmark claims. The current artifacts use 21 explicit recall checks, 9 holdout query events, one local real-model run bundle, and deterministic cross-model harness smoke checks. Use the exact wording below until larger repeated real-model benchmarks are available.

Evaluation Baseline Recall Best Smriti AI Recall Absolute Lift Notes
Gemma-style three-fact protocol 0/3 3/3 +3 facts Baseline 5.71s, Semantic+Graph+Identity 4.99s avg CPU latency.
Five-mode comparison (max_new_tokens=16) 0/3 3/3 +3 facts Fastest successful memory mode: Semantic+Graph at 2.78s avg CPU latency.
Original broader protocol rerun (max_new_tokens=256) 0/3 3/3 +3 facts Overall average improved from 0.524 to 0.832 (+58.9%).

Five-mode comparison:

Configuration Recall Avg Latency Context Coherence Notes
Baseline 0/3 4.927s 0.000 Frozen Gemma 4, no memory layer.
TF-IDF 3/3 3.481s 0.667 Lexical memory mode.
Semantic 3/3 2.857s 0.333 Embedding-based memory mode.
Semantic + Graph 3/3 2.781s 0.667 Fastest successful memory mode in this CPU run.
Semantic + Graph + Identity 3/3 5.164s 0.000 Adds persona governance overhead.

Original broader protocol rerun:

Metric Baseline Smriti AI Delta
Memory retention 0.000 1.000 +inf%
Response consistency 0.571 0.496 -13.2%
Context coherence 1.000 1.000 +0.0%
Overall average 0.524 0.832 +58.9%

The older +31.2% overall number from earlier writeups remains historical lineage. The current comparable broader-protocol rerun is +58.9% under this local Gemma 4 CPU setup with max_new_tokens=256.

Safe external wording:

Current local Gemma 4 validation artifacts show 3/3 recall on the checked-in
memory-retention protocol. Larger repeated real-model and holdout benchmarks
are required before claiming industry-standard benchmark superiority.

Run the benchmark-readiness audit:

make benchmark-readiness

Plan the enterprise industry benchmark without running real models:

make benchmark-industry
make benchmark-industry-hf

See docs/industry_benchmarks.md for the full industry gate, required model matrix, provenance schema, and safe public-claim rules.

Run Benchmarks

Install benchmark and ML extras:

pip install -e ".[ml,bench]"

Gemma-style memory retention:

python benchmarks/run_gemma_eval.py

Five-configuration comparison:

python benchmarks/run_benchmarks.py \
  --model-preset gemma4 \
  --configurations tfidf semantic semantic_graph semantic_graph_identity \
  --devices auto \
  --max-new-tokens 16 \
  --output benchmarks/results_comparison.csv

Original broader protocol rerun:

python benchmarks/run_historical_protocol.py --max-new-tokens 256

Cross-model harness:

python benchmarks/run_benchmarks.py \
  --model-preset cross_model \
  --output benchmarks/cross_model_results.csv \
  --summary-output benchmarks/summary.md

Long-memory / LoCoMo-style runner:

python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic

Identity drift benchmark:

python benchmarks/run_identity_bench.py --output reports/identity_evaluation.csv

Aggregate summaries:

python benchmarks/summarize_results.py

Benchmark Evidence Files

File Purpose
benchmarks/results_gemma_eval.csv Gemma 4 baseline vs Smriti three-fact evaluation.
benchmarks/results_comparison.csv Baseline, TF-IDF, semantic, semantic+graph, semantic+graph+identity.
benchmarks/results_historical_protocol.csv Current rerun of the older broader protocol.
benchmarks/results_historical_protocol_responses.json Response audit trail for the broader-protocol rerun.
benchmarks/cross_model_results.csv Optional cross-model memory-retention comparison.
benchmarks/longmem_results.csv Optional LoCoMo-style long-memory output.
benchmarks/latency_gemma4.csv Dedicated Gemma 4 latency/token probe.
reports/identity_evaluation.csv Persona drift detection benchmark.
results/summary.md Human-readable aggregate summary.
benchmarks/README.md Generated benchmark table.
model_card_smriti.md Model card and result disclaimer.
research/evidence/benchmark_lineage.csv Historical/current result ledger and claim-status labels.

Testing

Run the full test suite:

pytest -q

Run the production hardening matrices:

make test              # unit + deterministic test-double integration
make test-security     # prompt injection, redaction, auth/RBAC, delete/encryption
make test-benchmarks   # deterministic benchmark artifacts and budgets
make production-gates  # manifest, regression, privacy, and gate report checks
make end-user-readiness # first-run install/docs/CLI/deployment readiness checks

These PR-safe tests use deterministic test-double paths. Gemma 4 and other real-model benchmarks are reserved for nightly/manual runs so ordinary contributors do not need to download large gated checkpoints.

Run with coverage:

pytest --cov=smriti --cov-report=term-missing --cov-report=html:reports/coverage/html

Run style checks:

ruff check benchmarks scripts src tests

Build and install the wheel locally:

python -m build
python -m venv .venv-wheel
source .venv-wheel/bin/activate
pip install dist/smriti_ai-*.whl
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('wheel OK')"

Smoke Tests

Local API smoke test:

bash scripts/smoke_test.sh

Latency probe:

python scripts/measure_latency.py --retrieval-modes tfidf semantic --output benchmarks/latency_results.csv

Load test helper:

python scripts/load_test_runner.py --users 10 --spawn-rate 10 --run-time 30s --backend json

See docs/load_testing.md for the 10/100/1000-user matrix and report files.

Fault-tolerance probe:

python scripts/fault_tolerance_tests.py --url http://localhost:8000

Agentic Harness Evolution

Smriti AI now includes an AHE-inspired loop for improving the inference-time memory harness while keeping Gemma 4 or any other base model frozen.

Layer File Purpose
Harness config configs/harness_params.yaml Editable retrieval, graph, compression, and identity-governance parameters.
Evidence collection benchmarks/collect_evidence.py Runs memory-retention or JSON/JSONL long-memory tasks and writes summary/log evidence.
Evolution decision evolve_harness.py Applies bounded heuristics and appends a predicted-impact manifest entry.
Closed loop run_evolution.py Re-evaluates proposed configs, reverts regressions, and can tag Git iterations.
Audit trail manifests/evolve_manifest.jsonl JSONL history of component changed, previous/new values, reason, prediction, observed effect, and config snapshots.
Harness registry harnesses/ Versioned seed/evolved harness artifacts with metadata, results, and production status.
Manifest verifier src/smriti/manifest_verifier.py Validates that each accepted/rejected change has before/after evidence.
Production gates src/smriti/production_gates.py Runs tests, backend/privacy checks, validation, holdout, cross-model, latency, token, and identity gates before promotion.
Canary routing src/smriti/canary.py Sticky per-user canary routing for evolved harnesses with rollback conditions.

Quick local loop:

python benchmarks/collect_evidence.py \
  --config configs/harness_params.yaml \
  --summary benchmarks/evidence_summary.json

python evolve_harness.py \
  --config configs/harness_params.yaml \
  --evidence benchmarks/evidence_summary.json

python run_evolution.py --iterations 5 --no-commit

Validation and release-gate loop:

python benchmarks/validate_harness_evolution.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_holdout_eval.py \
  --config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_cross_model_harness_eval.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python harness/verify_manifest.py
python harness/production_gates.py evolved-v1 --to candidate

Harness registry CLI:

smriti-cli harness list
smriti-cli harness show evolved-v1
smriti-cli harness compare seed evolved-v1
smriti-cli harness activate evolved-v1
smriti-cli harness rollback seed
smriti-cli harness verify-manifest
smriti-cli harness promote evolved-v1 --to production
smriti-cli harness regression-test

API/dashboard support:

Endpoint Purpose
GET /harness/current Show active harness parameters and registry entries.
GET /harness/history Return manifest history.
GET /harness/metrics Return validation and canary metrics.
POST /harness/rollback Roll back to a registry harness. Admin-only when auth is enabled.
POST /harness/evaluate Run seed-vs-evolved validation. Admin-only when auth is enabled.
GET /harness/canary/status Show active/canary routing status.
POST /harness/canary/start Start sticky canary routing. Admin-only when auth is enabled.
POST /harness/canary/stop Stop canary routing.
POST /harness/canary/promote Promote canary harness.
POST /harness/canary/rollback Roll back canary harness.

The web dashboard also exposes a harness cockpit with current parameters, recent manifest entries, manual overrides, rollback controls, quick evaluation, seed comparison, and report export. See docs/research_lineage.md for the research rationale and AHE mapping.

Generated harness artifacts:

Artifact Purpose
results/harness_evolution_validation.md Baseline vs seed vs evolved harness validation.
results/evolution_generalization_report.md Final holdout evaluation.
results/cross_model_harness_eval.md Cross-model deterministic harness validation.
results/manifest_verification.md Manifest integrity report.
results/production_gate_report.md Promotion gate verdict.
results/canary_report.md Canary routing status and metrics.
reports/evolution_report.md Stakeholder-readable evolution report.

GPU And CPU Behavior

Smriti AI is designed to fall back cleanly to CPU.

Component CPU GPU
Base model Works, slower for Gemma 4. Moves model to CUDA when available.
Generation dtype float32 on CPU. bfloat16 if supported, else float16.
Embeddings Sentence-transformers can run on CPU. Embedding model can move to CUDA.
FAISS Uses CPU by default. Attempts GPU indices when CUDA FAISS support is available.

For practical demos with Gemma 4, GPU is recommended. CPU is acceptable for reproducibility but slower.

Training Research Package

Runtime Smriti AI is training-free. The training package is separate and optional.

pip install -e ".[training]"
python -m training.ewc_replay --model google/gemma-4-E2B-it --dataset path/to/data.jsonl --dry-run

The training module includes replay/EWC experiment scaffolding and logs metrics under training/. It is not imported by smriti during inference.

CI/CD

Workflow Trigger Purpose
.github/workflows/ci.yml Push / PR Install, lint, test, compile, build, install wheel, audit, upload artifacts.
.github/workflows/test_agent_hardening.yml Push / PR/manual Unit, test-double integration, OWASP-style security, benchmark smoke, production gates, optional backend jobs.
.github/workflows/nightly_benchmarks.yml Nightly/manual Real-model Gemma-style, retrieval, holdout, and identity benchmarks.
.github/workflows/harness_production_gate.yml Harness/API/benchmark changes Verify manifest and run production gates for evolved harnesses.
.github/workflows/benchmark.yml Nightly/manual Run benchmark suite on a small configured setup.
.github/workflows/load-test.yml Push/nightly/manual Run a 10-user API load smoke test and upload reports.
.github/workflows/docker.yml Tags/manual Build API/demo/training Docker images.
.github/workflows/release.yml Tag push Build package and publish release artifacts.

Latest local push for the harness-production work passed CI, Harness Production Gate, and Load Test workflows.

Repository Layout

smriti-ai/
|-- src/smriti/                  # Runtime memory package
|-- src/training/                # Optional replay/EWC research code
|-- src/demo/                    # Small web demo
|-- demos/                       # Kaggle/Colab package-import notebooks
|-- benchmarks/                  # Gemma 4 evaluations and CSV results
|-- configs/                     # Harness and runtime parameter files
|-- harness/                     # Manifest verification and production-gate wrappers
|-- harnesses/                   # Versioned seed/evolved harness registry
|-- tests/                       # Unit and integration tests
|-- scripts/                     # Setup, smoke, latency, load, fault probes
|-- docs/                        # Privacy and API documentation
|-- manifests/                   # AHE JSONL evolution audit trail
|-- research/artifacts/          # Curated original notebooks, logs, and excerpts
|-- research/evidence/           # Curated benchmark lineage and evidence policy
|-- monitoring/                  # Prometheus/Grafana assets
|-- support/                     # Troubleshooting and sample configs
|-- notebooks/                   # Package-based demo notebook
|-- reports/                     # Readiness, coverage, metrics reports
|-- Dockerfile                   # GPU-capable API image
|-- Dockerfile.cpu               # Lightweight CPU API image
|-- Dockerfile.demo              # Demo image
|-- Dockerfile.training          # Training/research image
|-- docker-compose.yml           # Local API/backends/monitoring stack
|-- docker-compose.prod.yml      # Production-oriented compose stack
|-- pyproject.toml               # Package metadata and extras
|-- config.yaml                  # Local config template
|-- evolve_harness.py            # One-step harness evolution proposal script
|-- run_evolution.py             # Closed-loop evidence/evolve/verify driver
|-- ROADMAP.md                   # Post-v1 roadmap including AHE hardening
|-- model_card_smriti.md         # Model card and benchmark disclosure

Production Readiness Notes

Area Recommendation
Auth Set SMRITI_API_KEY or place API behind an authenticated gateway.
RBAC Use AUTH_ENABLED=true and role-bound keys for production endpoints.
Secrets Use environment variables or a secret manager, not committed config files.
Storage Use SQLite for local apps, Redis/Postgres for server deployments.
Encryption Set SMRITI_MEMORY_KEY for sensitive memory.
Backups Back up JSON/SQLite/Postgres memory stores according to your RPO/RTO.
Observability Scrape /metrics and use Grafana dashboard panels.
Load testing Run Locust/wrk-style load tests before enterprise rollout.
Deletion Wire /memory/delete into user data deletion workflows.
Audit Protect demo.audit_app and audit endpoints before exposing memory inspection.
Benchmarking Rerun Gemma 4 benchmarks on your hardware before publishing claims.

See reports/production_readiness.md for the latest QA snapshot.

Troubleshooting

Problem Likely Cause Fix
ModuleNotFoundError: smriti Package not installed in current environment. Run pip install -e . or activate the correct venv.
ModuleNotFoundError: transformers ML extras not installed. Run pip install -e ".[ml]".
Gemma 4 fails to load Missing Hugging Face access or incompatible Transformers stack. Run hf auth login and update ML dependencies.
API returns memory-only text No model agent factory is registered. Use SmritiAILite in Python or deploy API with an agent factory.
API returns 401 SMRITI_API_KEY is set. Send x-api-key: <key>.
Memory not persisted Autosave disabled or backend not configured. Set SMRITI_AUTOSAVE=1 and configure backend.
Encrypted memory cannot load Missing or wrong SMRITI_MEMORY_KEY. Set the same key used when saving.
Docker image is large ML dependencies and model runtimes are heavy. Use Dockerfile.cpu for API-only memory service.
Benchmarks are slow Gemma 4 on CPU is heavy. Use GPU or reduce max_new_tokens for local checks.

Roadmap

Release Theme Planned Work
v1.1 Memory quality Hot/cold memory tiers, stronger compression, multilingual embeddings, cross-lingual recall tests, configurable decay/top-K/summarization thresholds.
v1.2 Scalability Async backend paths, asyncpg Postgres option, batched writes, embedding cache, 100/500/1000-user load reports.
v1.3 Research LongMemEval/MemoryBench tracking, SmritiBench design, temporal/weighted graph memory, cross-agent shared memory with strict isolation.
Ongoing Community Good-first issues, backend/adapter contribution guides, benchmark reproducibility reports, pilot-user feedback loops.

See ROADMAP.md for the living post-v1 roadmap.

Contributing

See CONTRIBUTING.md for development setup and contribution guidance.

Recommended local loop:

pip install -e ".[dev,bench]"
ruff check src benchmarks scripts tests
pytest -q
python -m build

Release history and stakeholder-facing notes live in CHANGELOG.md and RELEASE_NOTES_v1.0.7.md. A tutorial draft for the v1 memory protocol, audit UI, and benchmark evidence lives in docs/blog/smriti-ai-v1-memory-layer.md.

License

Apache-2.0. See pyproject.toml for package metadata.

Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

System Recall Precision@K p95 latency ms Token overhead Privacy delete
baseline_frozen_model 0.000 0.000 0.000 0 True
smriti_seed_harness 1.000 0.333 0.525 328 True
smriti_evolved_harness 1.000 0.333 0.168 328 True

Cross-model harness validation:

Model Seed recall Evolved recall Gate
google/gemma-4-E2B-it 1.000 1.000 pass
meta-llama/Llama-3.2-1B 1.000 1.000 pass
microsoft/Phi-3-mini-4k-instruct 1.000 1.000 pass
mistralai/Mistral-7B-Instruct-v0.3 1.000 1.000 pass
Qwen/Qwen2.5-1.5B-Instruct 1.000 1.000 pass

Production gate report: results/production_gate_report.md

Deterministic test doubles are used only for CI stability and never counted as public benchmark evidence. By - Soumyajit Ghosh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smriti_memory_ai-1.0.7.tar.gz (199.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smriti_memory_ai-1.0.7-py3-none-any.whl (180.5 kB view details)

Uploaded Python 3

File details

Details for the file smriti_memory_ai-1.0.7.tar.gz.

File metadata

  • Download URL: smriti_memory_ai-1.0.7.tar.gz
  • Upload date:
  • Size: 199.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smriti_memory_ai-1.0.7.tar.gz
Algorithm Hash digest
SHA256 ff3d132a974be9210e73cb8dc2441d6629ba77e79097a42c82bee22bacf78350
MD5 46b78774d57975146d56288ae576a15e
BLAKE2b-256 4bfd41068209726d8907e6ef4c724aad31565899afd46a6c0e0726ebea3467a7

See more details on using hashes here.

File details

Details for the file smriti_memory_ai-1.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for smriti_memory_ai-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 beafc47accd82de37fe970c1d9ec78aa2846d120c4e5765a28584e6ab5d0b21c
MD5 d8b4f4f49af37b433a3bf030027cae74
BLAKE2b-256 27b6ed00eaeba6e45107cf54faee475603f909c66267fd11614f511fdb855687

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page