Smriti AI: inference-time semantic memory for small language models
Project description
Smriti AI
Smriti AI is a local-first, training-free memory layer for small language models. It wraps a frozen HuggingFace causal language model with persistent memory, semantic retrieval, graph memory, identity governance, API tooling, Docker deployment, benchmarks, and production-readiness checks.
The name comes from smriti (IAST: smṛti), a Sanskrit term associated with memory and remembrance. Wisdom Library describes Smritis as "that which has to be remembered": Wisdom Library: Smriti.
Small models are not only limited by parameter count. They are limited by the absence of durable memory.
Smriti AI keeps the base model frozen. It improves long-term recall by storing external memory, retrieving only relevant facts, injecting them into the prompt, and updating the memory after each interaction. No LoRA, fine-tuning, adapter weights, or model retraining are required for the inference-time memory system.
Current Status
Smriti AI is packaged on PyPI as smriti-memory-ai with the importable package smriti. The GitHub repository and Hugging Face resources remain named smriti-ai.
| Area | Status |
|---|---|
| Python package | pyproject.toml, src/ layout, console scripts, build workflow. |
| Core memory | TF-IDF compatibility mode plus semantic session/topic/fact memory. |
| Retrieval | Sentence-transformer embeddings, FAISS/NumPy fallback, cosine similarity with temporal decay. |
| Graph memory | Per-session networkx knowledge graph with simple triple extraction and traversal. |
| Identity governance | Embedding-based persona fingerprint with drift checks and refinement hooks. |
| Backends | JSON, SQLite, Redis, and Postgres backend abstractions. |
| Privacy | Optional encrypted memory blobs and /memory/delete. |
| Audit controls | List, search, pin, archive, update, and delete individual memory entries. |
| Auth/RBAC | Optional API-key auth with user and admin roles. |
| API | FastAPI service with CORS, OpenAPI docs, metrics, health checks, API-key/RBAC option. |
| CLI | smriti, smriti-cli, smriti-api, migration commands, and backward-compatible mempalace. |
| Docker | CPU, GPU, demo, training Dockerfiles and compose profiles. |
| Monitoring | Prometheus endpoint and Grafana dashboard assets. |
| Benchmarks | Gemma 4 public benchmark policy, cross-model harness, LoCoMo-style runner, identity bench. |
| Provider adapters | Local HF, HF Endpoint, Ollama, vLLM, and OpenAI-compatible generation adapters. |
| Memory standard | Portable memory protocol plus backend conformance runner and schema migrations. |
| Research evidence | Curated historical/current benchmark lineage without shipping noisy raw logs. |
| Tests/CI | Unit/integration tests, package build/install checks, audit report, GitHub Actions. |
What Smriti AI Is
Smriti AI is an inference-time memory runtime. It sits between the user and the model.
For each turn, it can:
- Read the current user message.
- Retrieve relevant memories scoped by
session_idandtopic_id. - Query related graph triples.
- Build an augmented prompt.
- Generate with a frozen base model.
- Check persona/identity drift.
- Run a refinement pass if needed.
- Extract new facts and triples.
- Persist updated memory.
It is designed for:
| User Type | Why They Use It |
|---|---|
| Enterprise AI product teams | Add auditable memory to existing AI services without retraining models. |
| Personal assistant developers | Give local assistants user-specific recall across sessions. |
| Research groups | Evaluate memory augmentation, retrieval modes, and continual-learning boundaries. |
| Privacy-sensitive organizations | Keep user memory local or self-hosted with encryption and deletion hooks. |
| Multi-agent builders | Let planner, summarizer, and executor agents share one user's isolated memory. |
What Smriti AI Is Not
Smriti AI is not a hosted model provider, a replacement foundation model, or a fine-tuning method for inference-time recall. It does not magically improve every task. It improves tasks where persistent user facts, context continuity, retrieval, or persona stability matter.
The separate src/training/ package exists for replay/EWC research experiments. That training code is intentionally separate from the inference-time Smriti AI memory runtime.
Key Features
| Feature | Description |
|---|---|
| Semantic retrieval | Uses embeddings to retrieve meaning-similar facts even when wording changes. |
| Hierarchical memory | Stores memory as sessions -> topics -> facts, giving multi-user and multi-task isolation. |
| Temporal decay | Scores memories by cosine similarity multiplied by exp(-lambda * age). |
| TF-IDF mode | Keeps a lightweight lexical retrieval mode for compatibility and low-dependency environments. |
| Knowledge graph | Extracts simple subject-relation-object triples and injects related facts. |
| Identity fingerprint | Averages persona/self-description embeddings and detects drift in model outputs. |
| Memory compression | Summarizes older topic entries and archives originals before eviction. |
| Durable backends | JSON, SQLite, Redis, and Postgres backends through a common interface. |
| Encryption hooks | Optional symmetric encryption for memory blobs with SMRITI_MEMORY_KEY. |
| Deletion support | /memory/delete and CLI delete commands for user memory removal. |
| Audit dashboard | Authenticated memory table for search, edit, pin, archive, and per-entry deletion. |
| Provider adapters | Swap local Hugging Face for HF Endpoints, Ollama, vLLM, or OpenAI-compatible APIs. |
| FastAPI service | /chat, /memory/load, /memory/save, /memory/delete, /graph/query, /metrics, /health. |
| CLI | Local commands for config, chat, save/load/delete, graph query, server start, and benchmarks. |
| Docker | Compose stacks for API, Redis/Postgres, Prometheus/Grafana, demo, CPU/GPU images. |
| Benchmarks | Gemma 4 memory-retention, retrieval-mode comparison, latency, identity, LoCoMo-style long-memory, historical-protocol rerun. |
| CI/CD | GitHub Actions for tests, style checks, package build/install, Docker, release workflows. |
Research Lineage And Principles
Smriti AI was built from scratch around a few durable ideas from memory-augmented small-model systems.
| Principle | Smriti AI Interpretation | Current Implementation |
|---|---|---|
| External memory | Memory should live outside model weights in a portable, inspectable store. | MemPalaceLite, SemanticMemory, durable backends, JSON export/import. |
| Training-free recall | User recall should improve at inference time without changing model weights. | Retrieved memory is injected into prompts on each call. |
| Identity continuity | Assistants should maintain persona and user-specific context across turns. | IdentityFingerprint detects embedding drift and can trigger refinement. |
| Small-model augmentation | Small models become more useful when paired with explicit state. | Works with Gemma 4 and other HuggingFace causal LMs. |
| Local-first privacy | Memory should be deployable on a user's own machine or infrastructure. | JSON/SQLite local stores, optional encryption, deletion endpoint. |
| MLOps reproducibility | Memory systems should be benchmarked, tested, packaged, monitored, and deployable. | CI, Docker, benchmark CSVs, reports, model card, monitoring stack. |
Historical numbers from earlier writeups are treated as research lineage. Current claims should use the current Smriti AI benchmark artifacts in this repository.
For deeper lineage and reproducibility:
| Document | Purpose |
|---|---|
research/evidence/README.md |
Curated historical/current evidence policy. |
research/evidence/benchmark_lineage.csv |
Historical and current result ledger. |
research/README.md |
Curated original notebook/log/excerpt manifest. |
docs/memory_format.md |
Portable Smriti memory format and backend contract. |
docs/memory_spec.md |
Stable memory protocol for JSON, SQLite, Redis, and Postgres entries. |
docs/kaggle_colab.md |
Kaggle/Colab reproducibility guide using package imports. |
demos/smriti_kaggle.ipynb / demos/smriti_colab.ipynb |
Reproducible package-import notebooks. |
Architecture
flowchart TD
U["User / Agent Message"] --> A["SmritiAILite.chat"]
A --> Q["Session + Topic Scope"]
Q --> R["Memory Retrieval"]
R --> S["SemanticMemory or TF-IDF"]
R --> G["KnowledgeGraphMemory"]
S --> C["Context Builder"]
G --> C
C --> P["Augmented Prompt"]
P --> M["Frozen HuggingFace Causal LM"]
M --> I["IdentityFingerprint"]
I -->|"aligned"| O["Final Response"]
I -->|"drift"| F["Refinement Pass"]
F --> O
O --> E["Fact + Triple Extraction"]
E --> B["Durable Backend"]
B --> J["JSON / SQLite / Redis / Postgres"]
Core Modules
| Module | File | Responsibility |
|---|---|---|
SmritiAILite |
src/smriti/agent.py |
Main model wrapper for HuggingFace generation plus memory updates. |
BaselineGemma |
src/smriti/agent.py |
Plain model baseline with no memory layer. |
MemPalaceLite |
src/smriti/core.py |
High-level memory facade and backward-compatible API. |
SemanticMemory |
src/smriti/semantic_memory.py |
Hierarchical embedding memory with FAISS/NumPy retrieval, compression, JSON persistence. |
KnowledgeGraphMemory |
src/smriti/knowledge_graph.py |
Triple extraction, graph storage, query traversal, natural-language rendering. |
IdentityFingerprint |
src/smriti/identity_fingerprint.py |
Persona vectors, drift scoring, adaptive thresholds, refinement prompts. |
MACPLite |
src/smriti/macp.py |
Compact reasoning continuity state. |
| Backends | src/smriti/backends.py |
JSON, SQLite, Redis, Postgres, encryption, deletion. |
| Backend conformance | src/smriti/backend_conformance.py |
Reusable compatibility checks for backend authors. |
| Backend migrations | src/smriti/migrations.py, src/smriti/sql/ |
Versioned SQLite/Postgres schema files. |
| Audit | src/smriti/audit.py, src/smriti/audit_api.py |
Memory inspection, pin/archive/update/delete control plane. |
| Auth | src/smriti/auth.py |
API-key authentication and user/admin RBAC checks. |
| Provider adapters | src/smriti/adapters/ |
Local HF, HF Endpoint, Ollama, vLLM, OpenAI-compatible generation. |
| Config | src/smriti/config.py |
config.yaml and environment variable loading. |
| API | src/smriti/api.py |
FastAPI app, observability, optional API-key auth. |
| CLI | src/smriti/cli.py |
Local commands for config, memory, API, graph, and benchmark workflows. |
| Integrations | src/smriti/integrations/ |
LangChain and LlamaIndex adapters. |
| Training research | src/training/ewc_replay.py |
Optional replay/EWC experiments, separate from runtime memory. |
Installation
Requirements
| Requirement | Notes |
|---|---|
| Python | 3.10 or newer. |
| OS | Tested locally on macOS; CI validates Linux. Windows helpers are included. |
| Model runtime | Optional unless using SmritiAILite with a real HuggingFace model. |
| Gemma 4 access | Public benchmark path uses google/gemma-4-E2B-it; users may need Hugging Face access/login. |
Install From GitHub
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[ml,bench]"
For development:
pip install -e ".[dev,ml,bench]"
For all optional integrations:
pip install -e ".[full]"
Install From PyPI
The public PyPI distribution is smriti-memory-ai because the shorter
smriti-ai project name is already occupied on PyPI by another owner. The
Python import stays clean and stable:
from smriti import SmritiAILite
Recommended install:
pip install "smriti-memory-ai[ml]==1.0.5"
GitHub tag fallback:
pip install "smriti-memory-ai[ml] @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.5"
Then verify:
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('Smriti import OK')"
One-Shot Installer
Linux/macOS:
./install_smriti.sh
Windows PowerShell:
./install_smriti.ps1
Windows batch:
install_smriti.bat
The installer creates a local virtual environment, installs package extras, writes config.yaml, and can cache Gemma 4.
Model And API Keys
Smriti AI itself does not require a model-provider API key. It is a memory layer.
| Key | Required? | Purpose |
|---|---|---|
SMRITI_API_KEY |
Optional | Protects Smriti API routes. Clients send x-api-key. |
SMRITI_MEMORY_KEY |
Optional but recommended | Encrypts memory blobs before writing to disk or backend. |
HF_TOKEN |
Sometimes | Needed only if Hugging Face model access requires authentication. |
| Provider keys | Depends | Needed only if users connect Smriti to a hosted provider instead of a local model. |
For Gemma 4 via Hugging Face:
hf auth login
# or
export HF_TOKEN="your-huggingface-token"
For a protected Smriti API:
export SMRITI_API_KEY="replace-with-service-secret"
curl -H "x-api-key: $SMRITI_API_KEY" http://localhost:8000/health
For encrypted memory:
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
Do not commit real keys to Git.
Quick Start For Customers
Enterprise AI Teams
Deploy Smriti AI as a memory service behind your API gateway with Redis/Postgres and monitoring enabled:
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
COMPOSE_PROFILES=redis,monitoring SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
Integrate by sending POST /chat requests with a stable user_id, then scrape /metrics with Prometheus and review dashboards in Grafana. Use admin keys only for support/audit workflows.
Indie Developers And Personal Assistants
Install locally and start with JSON or SQLite memory:
pip install "smriti-memory-ai @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.5"
smriti-cli config wizard --backend json --overwrite
smriti-cli --session-id alex --topic-id profile chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile chat "What do you remember about me?"
Use LangChain/LlamaIndex integrations when you want Smriti memory inside an existing agent framework.
Researchers And Startups
Clone the repo, run benchmarks, and compare retrieval modes:
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
pip install -e ".[dev,ml,bench]"
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
Use demos/smriti_kaggle.ipynb and demos/smriti_colab.ipynb for package-based notebook demos.
Privacy-Sensitive Deployments
Keep memory local, encrypted, auditable, and deletable:
export SMRITI_MEMORY_BACKEND=sqlite
export SMRITI_SQLITE_PATH=data/smriti_memory.sqlite3
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
export AUTH_ENABLED=true
smriti-cli start-server --host 127.0.0.1 --port 8000
Expose /memory/delete in your user data-deletion flow and run the authenticated audit UI only for trusted operators.
Quick Start: Python Library
Memory-Only Usage
This path does not require PyTorch, transformers, or a model download.
from smriti import MemPalaceLite
memory = MemPalaceLite(retrieval_mode="semantic", session_id="alex", topic_id="profile")
memory.add_fact("Alex is a marine biologist in Hawaii.")
context = memory.get_context("What do you remember about Alex?")
print(context)
Full Gemma 4 Usage
This uses a real model and the Smriti AI wrapper.
from transformers import AutoModelForCausalLM, AutoTokenizer
from smriti import SmritiAILite
model_id = "google/gemma-4-E2B-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
agent = SmritiAILite(
model=model,
tokenizer=tokenizer,
retrieval_mode="semantic",
session_id="alex",
topic_id="profile",
)
agent.chat("My name is Alex and I am a marine biologist.")
reply = agent.chat("What do you remember about me?")
print(reply)
Save And Load Memory
agent.save_memory("smriti_memory.json")
agent.load_memory("smriti_memory.json")
Direct Semantic Memory
from smriti import SemanticMemory
memory = SemanticMemory()
memory.add_entry("user-a", "profile", "Maya is a doctor at a community clinic.")
results = memory.retrieve("user-a", "profile", "physician medical work", k=1)
print(results[0].entry.text)
Direct Knowledge Graph
from smriti import KnowledgeGraphMemory
graph = KnowledgeGraphMemory()
graph.add_triple("science", "Marie Curie", "discovered", "radium", topic_id="chemistry")
graph.add_triple("science", "radium", "is a", "chemical element", topic_id="chemistry")
facts = graph.triples_to_text(graph.query_graph("science", "Marie Curie", depth=2, topic_id="chemistry"))
print(facts)
Quick Start: CLI
Smriti AI installs these commands:
| Command | Purpose |
|---|---|
smriti-cli |
Main CLI. |
smriti |
Short alias. |
smriti-api |
Run the FastAPI service. |
mempalace |
Backward-compatible alias. |
Create config:
smriti-cli init config.yaml
Interactive backend wizard:
smriti-cli config wizard --backend json --overwrite
Store and retrieve local memory:
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "Where do I work?"
Save, load, delete:
smriti-cli --session-id alex memory save smriti_memory.json
smriti-cli --session-id alex memory load smriti_memory.json
smriti-cli --session-id alex memory delete --path smriti_memory.json
Run backend compatibility checks:
smriti-cli --backend json --backend-path data/memory backend conformance
smriti-cli --backend sqlite --backend-path data/smriti_memory.sqlite3 backend conformance
Migrate one user's memory between backends:
smriti-cli --session-id alex migrate-backend \
--from-backend json --from-path data/memory \
--to-backend sqlite --to-path data/smriti_memory.sqlite3
Query graph memory:
smriti-cli --session-id alex --topic-id profile graph_query user --depth 1
Run benchmarks:
# Installed package smoke check; internal-only, not public benchmark evidence.
SMRITI_ALLOW_TEST_DOUBLES=1 smriti-cli benchmark --quick
# Full Gemma 4 benchmark; run from a cloned source checkout.
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
Quick Start: FastAPI Service
Start locally:
smriti-cli start-server --host 0.0.0.0 --port 8000
# or
python -m smriti.api --host 0.0.0.0 --port 8000
Health check:
curl http://localhost:8000/health
OpenAPI docs:
http://localhost:8000/docs
Memory/chat request:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"user_id": "alex",
"topic_id": "profile",
"message": "My name is Alex and I am a marine biologist.",
"retrieval_mode": "semantic"
}'
Recall request:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"user_id": "alex",
"topic_id": "profile",
"message": "What do you remember about me?",
"retrieval_mode": "semantic"
}'
Important API Runtime Note
The default FastAPI service can operate as a memory service. If no model agent factory is configured, /chat returns memory-aware context and updates memory. To generate full model-backed assistant responses through the API, deploy the API process with a model runtime that registers an agent factory using set_agent_factory, or wrap Smriti inside your own service with SmritiAILite.
This separation keeps the memory service lightweight for enterprise integration while still supporting full local model-backed usage in Python.
API Endpoints
| Method | Endpoint | Purpose |
|---|---|---|
GET |
/health |
Liveness and loaded-user count. |
GET |
/metrics |
Prometheus metrics. |
POST |
/chat |
Retrieve context, update memory, optionally call configured model agent. |
POST |
/memory/save |
Save one user's memory. |
POST |
/memory/load |
Load memory from request body, backend, or path. |
POST |
/memory/delete |
Delete one user's memory from RAM/backend/path. |
POST |
/memory/list |
Audit/list one user's memory entries. |
POST |
/memory/update |
Edit an individual memory entry. |
POST |
/memory/pin |
Pin/unpin an important memory entry. |
POST |
/memory/archive |
Archive/unarchive an entry. |
POST |
/memory/entry/delete |
Delete one memory entry. |
POST |
/graph/query |
Query session-scoped graph facts. |
GET |
/docs |
FastAPI Swagger UI. |
Configuration
Smriti AI reads config.yaml by default or the file pointed to by SMRITI_CONFIG_PATH.
memory:
backend: json
memory_dir: data/memory
sqlite_path: data/smriti_memory.sqlite3
redis_url: redis://localhost:6379/0
postgres_dsn: ""
autosave: true
security:
encryption_key: ""
api:
host: 0.0.0.0
port: 8000
cors_origins:
- "*"
model:
adapter: local_hf
base_model_id: google/gemma-4-E2B-it
hf_endpoint_url: ""
ollama_url: http://localhost:11434
vllm_url: http://localhost:8000
openai_compatible_url: https://api.openai.com/v1
Environment variables override config values:
| Variable | Purpose |
|---|---|
SMRITI_CONFIG_PATH |
Path to config file. |
SMRITI_MEMORY_BACKEND |
json, sqlite, redis, or postgres. |
SMRITI_MEMORY_DIR |
JSON memory directory. |
SMRITI_SQLITE_PATH |
SQLite database path. |
SMRITI_REDIS_URL |
Redis connection URL. |
SMRITI_POSTGRES_DSN |
Postgres DSN. |
SMRITI_AUTOSAVE |
Save memory after API updates. |
SMRITI_MEMORY_KEY |
Encrypt memory blobs. |
SMRITI_API_KEY |
Protect API routes. |
AUTH_ENABLED |
Enable role-bound API-key auth. |
SMRITI_API_KEYS_PATH |
Path to api_keys.json with user/admin keys. |
SMRITI_CORS_ORIGINS |
Comma-separated CORS allowlist. |
SMRITI_HOST |
API host. |
SMRITI_PORT |
API port. |
SMRITI_MODEL_ADAPTER |
local_hf, hf_endpoint, ollama, vllm, or openai. |
BASE_MODEL_ID |
Base model ID, default google/gemma-4-E2B-it. |
HF_ENDPOINT_URL |
Hugging Face Inference Endpoint URL when using endpoint mode. |
OPENAI_API_KEY |
Provider key only when using OpenAI-compatible adapter. |
Durable Memory Backends
| Backend | Best For | Notes |
|---|---|---|
| JSON | Local experiments, privacy-first single-machine usage. | Simple files under data/memory. |
| SQLite | Local production, desktop apps, edge devices. | Single-file database, no external service. |
| Redis | Low-latency state for deployed agents. | Good for concurrent API use; use persistence in production. |
| Postgres | Enterprise durability and operational tooling. | Good fit for audited multi-user deployments. |
Examples:
SMRITI_MEMORY_BACKEND=json smriti-cli start-server
SMRITI_MEMORY_BACKEND=sqlite SMRITI_SQLITE_PATH=data/smriti.sqlite3 smriti-cli start-server
SMRITI_MEMORY_BACKEND=redis SMRITI_REDIS_URL=redis://localhost:6379/0 smriti-cli start-server
SMRITI_MEMORY_BACKEND=postgres SMRITI_POSTGRES_DSN=postgresql://host:5432/smriti smriti-cli start-server
Docker
Local CPU API
docker compose up -d --build api
API With Redis
COMPOSE_PROFILES=redis SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
API With Postgres
COMPOSE_PROFILES=postgres SMRITI_MEMORY_BACKEND=postgres docker compose up -d --build
GPU-Capable Image
When Docker has NVIDIA runtime support:
SMRITI_DOCKERFILE=Dockerfile docker compose up -d --build api
Production Compose
docker compose -f docker-compose.prod.yml up -d
Monitoring Stack
COMPOSE_PROFILES=monitoring docker compose up -d --build
Then open:
| Service | URL |
|---|---|
| API | http://localhost:8000 |
| API docs | http://localhost:8000/docs |
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
Default Grafana credentials in local compose:
admin / smriti
Hugging Face Model-Style Deployment
Smriti AI can also be packaged as a Hugging Face model repository with a custom handler.py. This does not make Smriti AI a newly trained foundation model. It packages the memory wrapper, model card, endpoint config, example requests, and upload tooling so Hugging Face Inference Endpoints can serve a memory-augmented base model.
Deployment assets live in:
deploy/huggingface_model/
deploy/huggingface_dataset/
deploy/huggingface_space/
Local handler smoke test:
BASE_MODEL_ID=google/gemma-4-E2B-it \
HF_TOKEN=$HF_TOKEN \
SMRITI_MEMORY_BACKEND=json \
SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
python deploy/huggingface_model/test_handler_local.py
Upload to a Hugging Face model repo:
export HF_TOKEN=...
python deploy/huggingface_model/upload_model_repo.py \
--repo-id luciferai-devil/smriti-ai \
--private false
Official v1.0 Hugging Face targets:
| Asset | Repo |
|---|---|
| Model wrapper | luciferai-devil/smriti-ai |
| Benchmark dataset | luciferai-devil/smriti-ai-benchmarks |
| CPU-safe demo Space | luciferai-devil/smriti-ai-demo |
Upload sanitized benchmark artifacts:
python deploy/huggingface_dataset/upload_benchmark_dataset.py \
--repo-id luciferai-devil/smriti-ai-benchmarks \
--private false
Upload the public demo Space:
python deploy/huggingface_space/upload_space.py \
--repo-id luciferai-devil/smriti-ai-demo \
--private false
The Space runs in CPU-safe memory-only mode by default, warns users not to enter PII, and auto-deletes demo memory after inactivity.
Use BASE_MODEL_ID for a locally loaded model inside the endpoint, or HF_ENDPOINT_URL if the Smriti handler should call another model endpoint. Production endpoints should use external Redis/Postgres memory and must not store private user memory inside the model repository.
See docs/deploy_as_hf_model.md for the full deployment guide.
Monitoring And Observability
The API exports Prometheus metrics from /metrics.
| Metric | Meaning |
|---|---|
smriti_http_requests_total |
Request count by method, path, status. |
smriti_http_errors_total |
Server-side error count. |
smriti_http_request_latency_seconds |
End-to-end request latency histogram. |
smriti_retrieval_latency_seconds |
Memory retrieval latency histogram. |
smriti_tokens_total |
Approximate token count observed by API. |
smriti_user_memories |
Number of loaded user memory stores. |
smriti_user_memory_bytes |
Approximate serialized memory size by user. |
Observability helper:
python scripts/metrics_monitor.py --url http://localhost:8000 --output reports/metrics_report.md
Privacy And Security
Smriti AI stores user memory, so privacy is a core operational concern.
| Requirement | Smriti AI Support |
|---|---|
| User isolation | Memory is keyed by user_id / session_id and topic_id. |
| Deletion | /memory/delete and smriti-cli memory delete. |
| Encryption | Set SMRITI_MEMORY_KEY to encrypt backend blobs. |
| API protection | Set SMRITI_API_KEY or AUTH_ENABLED=true with api_keys.json. |
| RBAC | user keys can access only their bound user_id; admin keys can operate across users. |
| Local-first deployment | JSON/SQLite can run fully on-device. |
| Auditability | Memory can be exported, inspected, pinned, archived, edited, and deleted. |
Delete user memory through API:
curl -X POST http://localhost:8000/memory/delete \
-H "Content-Type: application/json" \
-d '{"user_id": "alex"}'
Read more in docs/privacy.md.
Role-bound API keys:
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
curl -H "x-api-key: replace-with-user-key" http://localhost:8000/health
Read more in docs/auth.md.
Framework Integrations
LangChain
from smriti.integrations.langchain import SmritiMemory
memory = SmritiMemory(session_id="alex", topic_id="profile")
memory.save_context(
{"input": "My name is Alex and I work at Ocean Lab."},
{"output": "Nice to meet you, Alex."},
)
print(memory.load_memory_variables({"input": "Where do I work?"}))
LlamaIndex
from smriti.integrations.llama_index import SmritiStorageContext
storage = SmritiStorageContext(session_id="alex", topic_id="profile")
storage.add_node("Alex is a marine biologist.")
print(storage.query("What does Alex do?"))
Web Demo
The demo app lets users inject facts, ask distractors, view retrieved memories, and delete user memory.
pip install -e ".[demo]"
uvicorn demo.app:app --port 8080
You can also run the packaged module directly:
python -m demo.app
Screenshots from the current dashboard:
See src/demo/README.md for details.
Memory Audit Dashboard
The audit dashboard is a separate authenticated UI for operators and privacy reviews:
export SMRITI_AUDIT_USER=admin
export SMRITI_AUDIT_PASSWORD="replace-with-a-strong-password"
uvicorn demo.audit_app:app --port 8090
Open http://127.0.0.1:8090 and sign in with the configured credentials. Use it to search, edit, pin, archive, and delete individual memories.
Model Provider Adapters
Smriti AI can wrap multiple generation providers through a small adapter interface:
from smriti.adapters import build_adapter
adapter = build_adapter("hf_endpoint")
text = adapter.generate("Augmented Smriti prompt", max_new_tokens=128)
Supported adapters:
| Adapter | Typical target |
|---|---|
local_hf |
Local Transformers Gemma 4. |
hf_endpoint |
Hugging Face Inference Endpoint. |
ollama |
Local Ollama REST server. |
vllm |
vLLM server. |
openai |
OpenAI-compatible hosted APIs. |
Read more in docs/adapters.md.
Benchmarks
Benchmark Policy
Public benchmark claims in this repository use real Gemma 4 only:
google/gemma-4-E2B-it
Deterministic test-double paths may exist for engineering tests, but they are not public model-quality claims.
Current Local Gemma 4 Results
These are current local CPU measurements from the checked-in CSV artifacts.
| Evaluation | Baseline Recall | Best Smriti AI Recall | Absolute Lift | Notes |
|---|---|---|---|---|
| Gemma-style three-fact protocol | 0/3 | 3/3 | +3 facts | Baseline 5.71s, Semantic+Graph+Identity 4.99s avg CPU latency. |
Five-mode comparison (max_new_tokens=16) |
0/3 | 3/3 | +3 facts | Fastest successful memory mode: Semantic+Graph at 2.78s avg CPU latency. |
Original broader protocol rerun (max_new_tokens=256) |
0/3 | 3/3 | +3 facts | Overall average improved from 0.524 to 0.832 (+58.9%). |
Five-mode comparison:
| Configuration | Recall | Avg Latency | Context Coherence | Notes |
|---|---|---|---|---|
| Baseline | 0/3 | 4.927s | 0.000 | Frozen Gemma 4, no memory layer. |
| TF-IDF | 3/3 | 3.481s | 0.667 | Lexical memory mode. |
| Semantic | 3/3 | 2.857s | 0.333 | Embedding-based memory mode. |
| Semantic + Graph | 3/3 | 2.781s | 0.667 | Fastest successful memory mode in this CPU run. |
| Semantic + Graph + Identity | 3/3 | 5.164s | 0.000 | Adds persona governance overhead. |
Original broader protocol rerun:
| Metric | Baseline | Smriti AI | Delta |
|---|---|---|---|
| Memory retention | 0.000 | 1.000 | +inf% |
| Response consistency | 0.571 | 0.496 | -13.2% |
| Context coherence | 1.000 | 1.000 | +0.0% |
| Overall average | 0.524 | 0.832 | +58.9% |
The older +31.2% overall number from earlier writeups remains historical lineage. The current comparable broader-protocol rerun is +58.9% under this local Gemma 4 CPU setup with max_new_tokens=256.
Run Benchmarks
Install benchmark and ML extras:
pip install -e ".[ml,bench]"
Gemma-style memory retention:
python benchmarks/run_gemma_eval.py
Five-configuration comparison:
python benchmarks/run_benchmarks.py \
--model-preset gemma4 \
--configurations tfidf semantic semantic_graph semantic_graph_identity \
--devices auto \
--max-new-tokens 16 \
--output benchmarks/results_comparison.csv
Original broader protocol rerun:
python benchmarks/run_historical_protocol.py --max-new-tokens 256
Cross-model harness:
python benchmarks/run_benchmarks.py \
--model-preset cross_model \
--output benchmarks/cross_model_results.csv \
--summary-output benchmarks/summary.md
Long-memory / LoCoMo-style runner:
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
Identity drift benchmark:
python benchmarks/run_identity_bench.py --output reports/identity_evaluation.csv
Aggregate summaries:
python benchmarks/summarize_results.py
Benchmark Evidence Files
| File | Purpose |
|---|---|
benchmarks/results_gemma_eval.csv |
Gemma 4 baseline vs Smriti three-fact evaluation. |
benchmarks/results_comparison.csv |
Baseline, TF-IDF, semantic, semantic+graph, semantic+graph+identity. |
benchmarks/results_historical_protocol.csv |
Current rerun of the older broader protocol. |
benchmarks/results_historical_protocol_responses.json |
Response audit trail for the broader-protocol rerun. |
benchmarks/cross_model_results.csv |
Optional cross-model memory-retention comparison. |
benchmarks/longmem_results.csv |
Optional LoCoMo-style long-memory output. |
benchmarks/latency_gemma4.csv |
Dedicated Gemma 4 latency/token probe. |
reports/identity_evaluation.csv |
Persona drift detection benchmark. |
results/summary.md |
Human-readable aggregate summary. |
benchmarks/README.md |
Generated benchmark table. |
model_card_smriti.md |
Model card and result disclaimer. |
research/evidence/benchmark_lineage.csv |
Historical/current result ledger and claim-status labels. |
Testing
Run the full test suite:
pytest -q
Run the production hardening matrices:
make test # unit + deterministic test-double integration
make test-security # prompt injection, redaction, auth/RBAC, delete/encryption
make test-benchmarks # deterministic benchmark artifacts and budgets
make production-gates # manifest, regression, privacy, and gate report checks
make end-user-readiness # first-run install/docs/CLI/deployment readiness checks
These PR-safe tests use deterministic test-double paths. Gemma 4 and other real-model benchmarks are reserved for nightly/manual runs so ordinary contributors do not need to download large gated checkpoints.
Run with coverage:
pytest --cov=smriti --cov-report=term-missing --cov-report=html:reports/coverage/html
Run style checks:
ruff check benchmarks scripts src tests
Build and install the wheel locally:
python -m build
python -m venv .venv-wheel
source .venv-wheel/bin/activate
pip install dist/smriti_ai-*.whl
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('wheel OK')"
Smoke Tests
Local API smoke test:
bash scripts/smoke_test.sh
Latency probe:
python scripts/measure_latency.py --retrieval-modes tfidf semantic --output benchmarks/latency_results.csv
Load test helper:
python scripts/load_test_runner.py --users 10 --spawn-rate 10 --run-time 30s --backend json
See docs/load_testing.md for the 10/100/1000-user matrix and report files.
Fault-tolerance probe:
python scripts/fault_tolerance_tests.py --url http://localhost:8000
Agentic Harness Evolution
Smriti AI now includes an AHE-inspired loop for improving the inference-time memory harness while keeping Gemma 4 or any other base model frozen.
| Layer | File | Purpose |
|---|---|---|
| Harness config | configs/harness_params.yaml |
Editable retrieval, graph, compression, and identity-governance parameters. |
| Evidence collection | benchmarks/collect_evidence.py |
Runs memory-retention or JSON/JSONL long-memory tasks and writes summary/log evidence. |
| Evolution decision | evolve_harness.py |
Applies bounded heuristics and appends a predicted-impact manifest entry. |
| Closed loop | run_evolution.py |
Re-evaluates proposed configs, reverts regressions, and can tag Git iterations. |
| Audit trail | manifests/evolve_manifest.jsonl |
JSONL history of component changed, previous/new values, reason, prediction, observed effect, and config snapshots. |
| Harness registry | harnesses/ |
Versioned seed/evolved harness artifacts with metadata, results, and production status. |
| Manifest verifier | src/smriti/manifest_verifier.py |
Validates that each accepted/rejected change has before/after evidence. |
| Production gates | src/smriti/production_gates.py |
Runs tests, backend/privacy checks, validation, holdout, cross-model, latency, token, and identity gates before promotion. |
| Canary routing | src/smriti/canary.py |
Sticky per-user canary routing for evolved harnesses with rollback conditions. |
Quick local loop:
python benchmarks/collect_evidence.py \
--config configs/harness_params.yaml \
--summary benchmarks/evidence_summary.json
python evolve_harness.py \
--config configs/harness_params.yaml \
--evidence benchmarks/evidence_summary.json
python run_evolution.py --iterations 5 --no-commit
Validation and release-gate loop:
python benchmarks/validate_harness_evolution.py \
--seed-config harnesses/seed/harness_params.yaml \
--evolved-config harnesses/evolved-v1/harness_params.yaml
python benchmarks/run_holdout_eval.py \
--config harnesses/evolved-v1/harness_params.yaml
python benchmarks/run_cross_model_harness_eval.py \
--seed-config harnesses/seed/harness_params.yaml \
--evolved-config harnesses/evolved-v1/harness_params.yaml
python harness/verify_manifest.py
python harness/production_gates.py evolved-v1 --to candidate
Harness registry CLI:
smriti-cli harness list
smriti-cli harness show evolved-v1
smriti-cli harness compare seed evolved-v1
smriti-cli harness activate evolved-v1
smriti-cli harness rollback seed
smriti-cli harness verify-manifest
smriti-cli harness promote evolved-v1 --to production
smriti-cli harness regression-test
API/dashboard support:
| Endpoint | Purpose |
|---|---|
GET /harness/current |
Show active harness parameters and registry entries. |
GET /harness/history |
Return manifest history. |
GET /harness/metrics |
Return validation and canary metrics. |
POST /harness/rollback |
Roll back to a registry harness. Admin-only when auth is enabled. |
POST /harness/evaluate |
Run seed-vs-evolved validation. Admin-only when auth is enabled. |
GET /harness/canary/status |
Show active/canary routing status. |
POST /harness/canary/start |
Start sticky canary routing. Admin-only when auth is enabled. |
POST /harness/canary/stop |
Stop canary routing. |
POST /harness/canary/promote |
Promote canary harness. |
POST /harness/canary/rollback |
Roll back canary harness. |
The web dashboard also exposes a harness cockpit with current parameters,
recent manifest entries, manual overrides, rollback controls, quick evaluation,
seed comparison, and report export. See docs/research_lineage.md for the
research rationale and AHE mapping.
Generated harness artifacts:
| Artifact | Purpose |
|---|---|
results/harness_evolution_validation.md |
Baseline vs seed vs evolved harness validation. |
results/evolution_generalization_report.md |
Final holdout evaluation. |
results/cross_model_harness_eval.md |
Cross-model deterministic harness validation. |
results/manifest_verification.md |
Manifest integrity report. |
results/production_gate_report.md |
Promotion gate verdict. |
results/canary_report.md |
Canary routing status and metrics. |
reports/evolution_report.md |
Stakeholder-readable evolution report. |
GPU And CPU Behavior
Smriti AI is designed to fall back cleanly to CPU.
| Component | CPU | GPU |
|---|---|---|
| Base model | Works, slower for Gemma 4. | Moves model to CUDA when available. |
| Generation dtype | float32 on CPU. |
bfloat16 if supported, else float16. |
| Embeddings | Sentence-transformers can run on CPU. | Embedding model can move to CUDA. |
| FAISS | Uses CPU by default. | Attempts GPU indices when CUDA FAISS support is available. |
For practical demos with Gemma 4, GPU is recommended. CPU is acceptable for reproducibility but slower.
Training Research Package
Runtime Smriti AI is training-free. The training package is separate and optional.
pip install -e ".[training]"
python -m training.ewc_replay --model google/gemma-4-E2B-it --dataset path/to/data.jsonl --dry-run
The training module includes replay/EWC experiment scaffolding and logs metrics under training/. It is not imported by smriti during inference.
CI/CD
| Workflow | Trigger | Purpose |
|---|---|---|
.github/workflows/ci.yml |
Push / PR | Install, lint, test, compile, build, install wheel, audit, upload artifacts. |
.github/workflows/test_agent_hardening.yml |
Push / PR/manual | Unit, test-double integration, OWASP-style security, benchmark smoke, production gates, optional backend jobs. |
.github/workflows/nightly_benchmarks.yml |
Nightly/manual | Real-model Gemma-style, retrieval, holdout, and identity benchmarks. |
.github/workflows/harness_production_gate.yml |
Harness/API/benchmark changes | Verify manifest and run production gates for evolved harnesses. |
.github/workflows/benchmark.yml |
Nightly/manual | Run benchmark suite on a small configured setup. |
.github/workflows/load-test.yml |
Push/nightly/manual | Run a 10-user API load smoke test and upload reports. |
.github/workflows/docker.yml |
Tags/manual | Build API/demo/training Docker images. |
.github/workflows/release.yml |
Tag push | Build package and publish release artifacts. |
Latest local push for the harness-production work passed CI, Harness Production Gate, and Load Test workflows.
Repository Layout
smriti-ai/
|-- src/smriti/ # Runtime memory package
|-- src/training/ # Optional replay/EWC research code
|-- src/demo/ # Small web demo
|-- demos/ # Kaggle/Colab package-import notebooks
|-- benchmarks/ # Gemma 4 evaluations and CSV results
|-- configs/ # Harness and runtime parameter files
|-- harness/ # Manifest verification and production-gate wrappers
|-- harnesses/ # Versioned seed/evolved harness registry
|-- tests/ # Unit and integration tests
|-- scripts/ # Setup, smoke, latency, load, fault probes
|-- docs/ # Privacy and API documentation
|-- manifests/ # AHE JSONL evolution audit trail
|-- research/artifacts/ # Curated original notebooks, logs, and excerpts
|-- research/evidence/ # Curated benchmark lineage and evidence policy
|-- monitoring/ # Prometheus/Grafana assets
|-- support/ # Troubleshooting and sample configs
|-- notebooks/ # Package-based demo notebook
|-- reports/ # Readiness, coverage, metrics reports
|-- Dockerfile # GPU-capable API image
|-- Dockerfile.cpu # Lightweight CPU API image
|-- Dockerfile.demo # Demo image
|-- Dockerfile.training # Training/research image
|-- docker-compose.yml # Local API/backends/monitoring stack
|-- docker-compose.prod.yml # Production-oriented compose stack
|-- pyproject.toml # Package metadata and extras
|-- config.yaml # Local config template
|-- evolve_harness.py # One-step harness evolution proposal script
|-- run_evolution.py # Closed-loop evidence/evolve/verify driver
|-- ROADMAP.md # Post-v1 roadmap including AHE hardening
|-- model_card_smriti.md # Model card and benchmark disclosure
Production Readiness Notes
| Area | Recommendation |
|---|---|
| Auth | Set SMRITI_API_KEY or place API behind an authenticated gateway. |
| RBAC | Use AUTH_ENABLED=true and role-bound keys for production endpoints. |
| Secrets | Use environment variables or a secret manager, not committed config files. |
| Storage | Use SQLite for local apps, Redis/Postgres for server deployments. |
| Encryption | Set SMRITI_MEMORY_KEY for sensitive memory. |
| Backups | Back up JSON/SQLite/Postgres memory stores according to your RPO/RTO. |
| Observability | Scrape /metrics and use Grafana dashboard panels. |
| Load testing | Run Locust/wrk-style load tests before enterprise rollout. |
| Deletion | Wire /memory/delete into user data deletion workflows. |
| Audit | Protect demo.audit_app and audit endpoints before exposing memory inspection. |
| Benchmarking | Rerun Gemma 4 benchmarks on your hardware before publishing claims. |
See reports/production_readiness.md for the latest QA snapshot.
Troubleshooting
| Problem | Likely Cause | Fix |
|---|---|---|
ModuleNotFoundError: smriti |
Package not installed in current environment. | Run pip install -e . or activate the correct venv. |
ModuleNotFoundError: transformers |
ML extras not installed. | Run pip install -e ".[ml]". |
| Gemma 4 fails to load | Missing Hugging Face access or incompatible Transformers stack. | Run hf auth login and update ML dependencies. |
| API returns memory-only text | No model agent factory is registered. | Use SmritiAILite in Python or deploy API with an agent factory. |
API returns 401 |
SMRITI_API_KEY is set. |
Send x-api-key: <key>. |
| Memory not persisted | Autosave disabled or backend not configured. | Set SMRITI_AUTOSAVE=1 and configure backend. |
| Encrypted memory cannot load | Missing or wrong SMRITI_MEMORY_KEY. |
Set the same key used when saving. |
| Docker image is large | ML dependencies and model runtimes are heavy. | Use Dockerfile.cpu for API-only memory service. |
| Benchmarks are slow | Gemma 4 on CPU is heavy. | Use GPU or reduce max_new_tokens for local checks. |
Roadmap
| Release | Theme | Planned Work |
|---|---|---|
| v1.1 | Memory quality | Hot/cold memory tiers, stronger compression, multilingual embeddings, cross-lingual recall tests, configurable decay/top-K/summarization thresholds. |
| v1.2 | Scalability | Async backend paths, asyncpg Postgres option, batched writes, embedding cache, 100/500/1000-user load reports. |
| v1.3 | Research | LongMemEval/MemoryBench tracking, SmritiBench design, temporal/weighted graph memory, cross-agent shared memory with strict isolation. |
| Ongoing | Community | Good-first issues, backend/adapter contribution guides, benchmark reproducibility reports, pilot-user feedback loops. |
See ROADMAP.md for the living post-v1 roadmap.
Contributing
See CONTRIBUTING.md for development setup and contribution guidance.
Recommended local loop:
pip install -e ".[dev,bench]"
ruff check src benchmarks scripts tests
pytest -q
python -m build
Release history and stakeholder-facing notes live in CHANGELOG.md and RELEASE_NOTES_v1.0.5.md. A tutorial draft for the v1 memory protocol, audit UI, and benchmark evidence lives in docs/blog/smriti-ai-v1-memory-layer.md.
License
Apache-2.0. See pyproject.toml for package metadata.
Harness Evolution Results
The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.
| System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete |
|---|---|---|---|---|---|
| baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True |
| smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True |
| smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True |
Cross-model harness validation:
| Model | Seed recall | Evolved recall | Gate |
|---|---|---|---|
| google/gemma-4-E2B-it | 1.000 | 1.000 | pass |
| meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass |
| microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass |
| mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass |
| Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass |
Production gate report: results/production_gate_report.md
Deterministic test doubles are used only for CI stability and never counted as public benchmark evidence. By - Soumyajit Ghosh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smriti_memory_ai-1.0.5.tar.gz.
File metadata
- Download URL: smriti_memory_ai-1.0.5.tar.gz
- Upload date:
- Size: 192.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbb52ac8da5fdcea5a50661b8dd2a06c19527b045f115d9be9343dcf79e42cb7
|
|
| MD5 |
7439ceecc677371dfcac070394b3e632
|
|
| BLAKE2b-256 |
bfb7467204bb7f22b261a96070a98869596d111da8f7e0e9ce5cfd99a1104935
|
File details
Details for the file smriti_memory_ai-1.0.5-py3-none-any.whl.
File metadata
- Download URL: smriti_memory_ai-1.0.5-py3-none-any.whl
- Upload date:
- Size: 177.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a28883ce5ad87d77c17568a3aab587f4fc3b060d0e08f54f3ce704c372716317
|
|
| MD5 |
7e6c1c6353ae0119688cd4df9a49569d
|
|
| BLAKE2b-256 |
82b69969ada8cc8fc38026c5fadfa487309a3a7075e48ae07695e2467f37c246
|