Smriti AI: inference-time semantic memory for small language models

These details have not been verified by PyPI

Project links

Project description

Smriti AI

Smriti AI is a local-first, training-free memory layer for small language models. It wraps a frozen HuggingFace causal language model with persistent memory, semantic retrieval, graph memory, identity governance, API tooling, Docker deployment, benchmarks, and production-readiness checks.

The name comes from smriti (IAST: smṛti), a Sanskrit term associated with memory and remembrance. Wisdom Library describes Smritis as "that which has to be remembered": Wisdom Library: Smriti.

Small models are not only limited by parameter count. They are limited by the absence of durable memory.

Smriti AI keeps the base model frozen. It improves long-term recall by storing external memory, retrieving only relevant facts, injecting them into the prompt, and updating the memory after each interaction. No LoRA, fine-tuning, adapter weights, or model retraining are required for the inference-time memory system.

Current Status

Smriti AI is packaged on PyPI as smriti-memory-ai with the importable package smriti. The GitHub repository and Hugging Face resources remain named smriti-ai.

Area	Status
Python package	`pyproject.toml`, `src/` layout, console scripts, build workflow.
Core memory	TF-IDF compatibility mode plus semantic session/topic/fact memory.
Retrieval	Sentence-transformer embeddings, FAISS/NumPy fallback, cosine similarity with temporal decay.
Graph memory	Per-session `networkx` knowledge graph with simple triple extraction and traversal.
Identity governance	Embedding-based persona fingerprint with drift checks and refinement hooks.
Backends	JSON, SQLite, Redis, and Postgres backend abstractions.
Privacy	Optional encrypted memory blobs and `/memory/delete`.
Audit controls	List, search, pin, archive, update, and delete individual memory entries.
Auth/RBAC	Optional API-key auth with `user` and `admin` roles.
API	FastAPI service with CORS, OpenAPI docs, metrics, health checks, API-key/RBAC option.
CLI	`smriti`, `smriti-cli`, `smriti-api`, migration commands, and backward-compatible `mempalace`.
Docker	CPU, GPU, demo, training Dockerfiles and compose profiles.
Monitoring	Prometheus endpoint and Grafana dashboard assets.
Benchmarks	Gemma 4 public benchmark policy, cross-model harness, LoCoMo-style runner, identity bench.
Provider adapters	Local HF, HF Endpoint, Ollama, vLLM, and OpenAI-compatible generation adapters.
Memory standard	Portable memory protocol plus backend conformance runner and schema migrations.
Research evidence	Curated historical/current benchmark lineage without shipping noisy raw logs.
Tests/CI	Unit/integration tests, package build/install checks, audit report, GitHub Actions.

What Smriti AI Is

Smriti AI is an inference-time memory runtime. It sits between the user and the model.

For each turn, it can:

Read the current user message.
Retrieve relevant memories scoped by session_id and topic_id.
Query related graph triples.
Build an augmented prompt.
Generate with a frozen base model.
Check persona/identity drift.
Run a refinement pass if needed.
Extract new facts and triples.
Persist updated memory.

It is designed for:

User Type	Why They Use It
Enterprise AI product teams	Add auditable memory to existing AI services without retraining models.
Personal assistant developers	Give local assistants user-specific recall across sessions.
Research groups	Evaluate memory augmentation, retrieval modes, and continual-learning boundaries.
Privacy-sensitive organizations	Keep user memory local or self-hosted with encryption and deletion hooks.
Multi-agent builders	Let planner, summarizer, and executor agents share one user's isolated memory.

What Smriti AI Is Not

Smriti AI is not a hosted model provider, a replacement foundation model, or a fine-tuning method for inference-time recall. It does not magically improve every task. It improves tasks where persistent user facts, context continuity, retrieval, or persona stability matter.

The separate src/training/ package exists for replay/EWC research experiments. That training code is intentionally separate from the inference-time Smriti AI memory runtime.

Key Features

Feature	Description
Semantic retrieval	Uses embeddings to retrieve meaning-similar facts even when wording changes.
Hierarchical memory	Stores memory as `sessions -> topics -> facts`, giving multi-user and multi-task isolation.
Temporal decay	Scores memories by cosine similarity multiplied by `exp(-lambda * age)`.
TF-IDF mode	Keeps a lightweight lexical retrieval mode for compatibility and low-dependency environments.
Knowledge graph	Extracts simple subject-relation-object triples and injects related facts.
Identity fingerprint	Averages persona/self-description embeddings and detects drift in model outputs.
Memory compression	Summarizes older topic entries and archives originals before eviction.
Durable backends	JSON, SQLite, Redis, and Postgres backends through a common interface.
Encryption hooks	Optional symmetric encryption for memory blobs with `SMRITI_MEMORY_KEY`.
Deletion support	`/memory/delete` and CLI delete commands for user memory removal.
Audit dashboard	Authenticated memory table for search, edit, pin, archive, and per-entry deletion.
Provider adapters	Swap local Hugging Face for HF Endpoints, Ollama, vLLM, or OpenAI-compatible APIs.
FastAPI service	`/chat`, `/memory/load`, `/memory/save`, `/memory/delete`, `/graph/query`, `/metrics`, `/health`.
CLI	Local commands for config, chat, save/load/delete, graph query, server start, and benchmarks.
Docker	Compose stacks for API, Redis/Postgres, Prometheus/Grafana, demo, CPU/GPU images.
Benchmarks	Gemma 4 memory-retention, retrieval-mode comparison, latency, identity, LoCoMo-style long-memory, historical-protocol rerun.
CI/CD	GitHub Actions for tests, style checks, package build/install, Docker, release workflows.

Research Lineage And Principles

Smriti AI was built from scratch around a few durable ideas from memory-augmented small-model systems.

Principle	Smriti AI Interpretation	Current Implementation
External memory	Memory should live outside model weights in a portable, inspectable store.	`MemPalaceLite`, `SemanticMemory`, durable backends, JSON export/import.
Training-free recall	User recall should improve at inference time without changing model weights.	Retrieved memory is injected into prompts on each call.
Identity continuity	Assistants should maintain persona and user-specific context across turns.	`IdentityFingerprint` detects embedding drift and can trigger refinement.
Small-model augmentation	Small models become more useful when paired with explicit state.	Works with Gemma 4 and other HuggingFace causal LMs.
Local-first privacy	Memory should be deployable on a user's own machine or infrastructure.	JSON/SQLite local stores, optional encryption, deletion endpoint.
MLOps reproducibility	Memory systems should be benchmarked, tested, packaged, monitored, and deployable.	CI, Docker, benchmark CSVs, reports, model card, monitoring stack.

Historical numbers from earlier writeups are treated as research lineage. Current claims should use the current Smriti AI benchmark artifacts in this repository.

For deeper lineage and reproducibility:

Document	Purpose
`research/evidence/README.md`	Curated historical/current evidence policy.
`research/evidence/benchmark_lineage.csv`	Historical and current result ledger.
`research/README.md`	Curated original notebook/log/excerpt manifest.
`docs/memory_format.md`	Portable Smriti memory format and backend contract.
`docs/memory_spec.md`	Stable memory protocol for JSON, SQLite, Redis, and Postgres entries.
`docs/kaggle_colab.md`	Kaggle/Colab reproducibility guide using package imports.
`demos/smriti_kaggle.ipynb` / `demos/smriti_colab.ipynb`	Reproducible package-import notebooks.

Architecture

flowchart TD
    U["User / Agent Message"] --> A["SmritiAILite.chat"]
    A --> Q["Session + Topic Scope"]
    Q --> R["Memory Retrieval"]
    R --> S["SemanticMemory or TF-IDF"]
    R --> G["KnowledgeGraphMemory"]
    S --> C["Context Builder"]
    G --> C
    C --> P["Augmented Prompt"]
    P --> M["Frozen HuggingFace Causal LM"]
    M --> I["IdentityFingerprint"]
    I -->|"aligned"| O["Final Response"]
    I -->|"drift"| F["Refinement Pass"]
    F --> O
    O --> E["Fact + Triple Extraction"]
    E --> B["Durable Backend"]
    B --> J["JSON / SQLite / Redis / Postgres"]

Core Modules

Module	File	Responsibility
`SmritiAILite`	`src/smriti/agent.py`	Main model wrapper for HuggingFace generation plus memory updates.
`BaselineGemma`	`src/smriti/agent.py`	Plain model baseline with no memory layer.
`MemPalaceLite`	`src/smriti/core.py`	High-level memory facade and backward-compatible API.
`SemanticMemory`	`src/smriti/semantic_memory.py`	Hierarchical embedding memory with FAISS/NumPy retrieval, compression, JSON persistence.
`KnowledgeGraphMemory`	`src/smriti/knowledge_graph.py`	Triple extraction, graph storage, query traversal, natural-language rendering.
`IdentityFingerprint`	`src/smriti/identity_fingerprint.py`	Persona vectors, drift scoring, adaptive thresholds, refinement prompts.
`MACPLite`	`src/smriti/macp.py`	Compact reasoning continuity state.
Backends	`src/smriti/backends.py`	JSON, SQLite, Redis, Postgres, encryption, deletion.
Backend conformance	`src/smriti/backend_conformance.py`	Reusable compatibility checks for backend authors.
Backend migrations	`src/smriti/migrations.py`, `src/smriti/sql/`	Versioned SQLite/Postgres schema files.
Audit	`src/smriti/audit.py`, `src/smriti/audit_api.py`	Memory inspection, pin/archive/update/delete control plane.
Auth	`src/smriti/auth.py`	API-key authentication and user/admin RBAC checks.
Provider adapters	`src/smriti/adapters/`	Local HF, HF Endpoint, Ollama, vLLM, OpenAI-compatible generation.
Config	`src/smriti/config.py`	`config.yaml` and environment variable loading.
API	`src/smriti/api.py`	FastAPI app, observability, optional API-key auth.
CLI	`src/smriti/cli.py`	Local commands for config, memory, API, graph, and benchmark workflows.
Integrations	`src/smriti/integrations/`	LangChain and LlamaIndex adapters.
Training research	`src/training/ewc_replay.py`	Optional replay/EWC experiments, separate from runtime memory.

Installation

Requirements

Requirement	Notes
Python	3.10 or newer.
OS	Tested locally on macOS; CI validates Linux. Windows helpers are included.
Model runtime	Optional unless using `SmritiAILite` with a real HuggingFace model.
Gemma 4 access	Public benchmark path uses `google/gemma-4-E2B-it`; users may need Hugging Face access/login.

Install From GitHub

git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[ml,bench]"

For development:

pip install -e ".[dev,ml,bench]"

For all optional integrations:

pip install -e ".[full]"

Install From PyPI

The public PyPI distribution is smriti-memory-ai because the shorter smriti-ai project name is already occupied on PyPI by another owner. The Python import stays clean and stable:

from smriti import SmritiAILite

Recommended install:

pip install "smriti-memory-ai[ml]==1.0.5"

GitHub tag fallback:

pip install "smriti-memory-ai[ml] @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.5"

Then verify:

python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('Smriti import OK')"

One-Shot Installer

Linux/macOS:

./install_smriti.sh

Windows PowerShell:

./install_smriti.ps1

Windows batch:

install_smriti.bat

The installer creates a local virtual environment, installs package extras, writes config.yaml, and can cache Gemma 4.

Model And API Keys

Smriti AI itself does not require a model-provider API key. It is a memory layer.

Key	Required?	Purpose
`SMRITI_API_KEY`	Optional	Protects Smriti API routes. Clients send `x-api-key`.
`SMRITI_MEMORY_KEY`	Optional but recommended	Encrypts memory blobs before writing to disk or backend.
`HF_TOKEN`	Sometimes	Needed only if Hugging Face model access requires authentication.
Provider keys	Depends	Needed only if users connect Smriti to a hosted provider instead of a local model.

For Gemma 4 via Hugging Face:

hf auth login
# or
export HF_TOKEN="your-huggingface-token"

For a protected Smriti API:

export SMRITI_API_KEY="replace-with-service-secret"
curl -H "x-api-key: $SMRITI_API_KEY" http://localhost:8000/health

For encrypted memory:

export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"

Do not commit real keys to Git.

Quick Start For Customers

Enterprise AI Teams

Deploy Smriti AI as a memory service behind your API gateway with Redis/Postgres and monitoring enabled:

cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
COMPOSE_PROFILES=redis,monitoring SMRITI_MEMORY_BACKEND=redis docker compose up -d --build

Integrate by sending POST /chat requests with a stable user_id, then scrape /metrics with Prometheus and review dashboards in Grafana. Use admin keys only for support/audit workflows.

Indie Developers And Personal Assistants

Install locally and start with JSON or SQLite memory:

pip install "smriti-memory-ai @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.5"
smriti-cli config wizard --backend json --overwrite
smriti-cli --session-id alex --topic-id profile chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile chat "What do you remember about me?"

Use LangChain/LlamaIndex integrations when you want Smriti memory inside an existing agent framework.

Researchers And Startups

Clone the repo, run benchmarks, and compare retrieval modes:

git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
pip install -e ".[dev,ml,bench]"
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic

Use demos/smriti_kaggle.ipynb and demos/smriti_colab.ipynb for package-based notebook demos.

Privacy-Sensitive Deployments

Keep memory local, encrypted, auditable, and deletable:

export SMRITI_MEMORY_BACKEND=sqlite
export SMRITI_SQLITE_PATH=data/smriti_memory.sqlite3
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
export AUTH_ENABLED=true
smriti-cli start-server --host 127.0.0.1 --port 8000

Expose /memory/delete in your user data-deletion flow and run the authenticated audit UI only for trusted operators.

Quick Start: Python Library

Memory-Only Usage

This path does not require PyTorch, transformers, or a model download.

from smriti import MemPalaceLite

memory = MemPalaceLite(retrieval_mode="semantic", session_id="alex", topic_id="profile")

memory.add_fact("Alex is a marine biologist in Hawaii.")
context = memory.get_context("What do you remember about Alex?")

print(context)

Full Gemma 4 Usage

This uses a real model and the Smriti AI wrapper.

from transformers import AutoModelForCausalLM, AutoTokenizer
from smriti import SmritiAILite

model_id = "google/gemma-4-E2B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

agent = SmritiAILite(
    model=model,
    tokenizer=tokenizer,
    retrieval_mode="semantic",
    session_id="alex",
    topic_id="profile",
)

agent.chat("My name is Alex and I am a marine biologist.")
reply = agent.chat("What do you remember about me?")
print(reply)

Save And Load Memory

agent.save_memory("smriti_memory.json")
agent.load_memory("smriti_memory.json")

Direct Semantic Memory

from smriti import SemanticMemory

memory = SemanticMemory()
memory.add_entry("user-a", "profile", "Maya is a doctor at a community clinic.")
results = memory.retrieve("user-a", "profile", "physician medical work", k=1)
print(results[0].entry.text)

Direct Knowledge Graph

from smriti import KnowledgeGraphMemory

graph = KnowledgeGraphMemory()
graph.add_triple("science", "Marie Curie", "discovered", "radium", topic_id="chemistry")
graph.add_triple("science", "radium", "is a", "chemical element", topic_id="chemistry")

facts = graph.triples_to_text(graph.query_graph("science", "Marie Curie", depth=2, topic_id="chemistry"))
print(facts)

Quick Start: CLI

Smriti AI installs these commands:

Command	Purpose
`smriti-cli`	Main CLI.
`smriti`	Short alias.
`smriti-api`	Run the FastAPI service.
`mempalace`	Backward-compatible alias.

Create config:

smriti-cli init config.yaml

Interactive backend wizard:

smriti-cli config wizard --backend json --overwrite

Store and retrieve local memory:

smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "Where do I work?"

Save, load, delete:

smriti-cli --session-id alex memory save smriti_memory.json
smriti-cli --session-id alex memory load smriti_memory.json
smriti-cli --session-id alex memory delete --path smriti_memory.json

Run backend compatibility checks:

smriti-cli --backend json --backend-path data/memory backend conformance
smriti-cli --backend sqlite --backend-path data/smriti_memory.sqlite3 backend conformance

Migrate one user's memory between backends:

smriti-cli --session-id alex migrate-backend \
  --from-backend json --from-path data/memory \
  --to-backend sqlite --to-path data/smriti_memory.sqlite3

Query graph memory:

smriti-cli --session-id alex --topic-id profile graph_query user --depth 1

Run benchmarks:

# Installed package smoke check; internal-only, not public benchmark evidence.
SMRITI_ALLOW_TEST_DOUBLES=1 smriti-cli benchmark --quick

# Full Gemma 4 benchmark; run from a cloned source checkout.
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16

Quick Start: FastAPI Service

Start locally:

smriti-cli start-server --host 0.0.0.0 --port 8000
# or
python -m smriti.api --host 0.0.0.0 --port 8000

Health check:

curl http://localhost:8000/health

OpenAPI docs:

http://localhost:8000/docs

Memory/chat request:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic"
  }'

Recall request:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic"
  }'

Important API Runtime Note

The default FastAPI service can operate as a memory service. If no model agent factory is configured, /chat returns memory-aware context and updates memory. To generate full model-backed assistant responses through the API, deploy the API process with a model runtime that registers an agent factory using set_agent_factory, or wrap Smriti inside your own service with SmritiAILite.

This separation keeps the memory service lightweight for enterprise integration while still supporting full local model-backed usage in Python.

API Endpoints

Method	Endpoint	Purpose
`GET`	`/health`	Liveness and loaded-user count.
`GET`	`/metrics`	Prometheus metrics.
`POST`	`/chat`	Retrieve context, update memory, optionally call configured model agent.
`POST`	`/memory/save`	Save one user's memory.
`POST`	`/memory/load`	Load memory from request body, backend, or path.
`POST`	`/memory/delete`	Delete one user's memory from RAM/backend/path.
`POST`	`/memory/list`	Audit/list one user's memory entries.
`POST`	`/memory/update`	Edit an individual memory entry.
`POST`	`/memory/pin`	Pin/unpin an important memory entry.
`POST`	`/memory/archive`	Archive/unarchive an entry.
`POST`	`/memory/entry/delete`	Delete one memory entry.
`POST`	`/graph/query`	Query session-scoped graph facts.
`GET`	`/docs`	FastAPI Swagger UI.

Configuration

Smriti AI reads config.yaml by default or the file pointed to by SMRITI_CONFIG_PATH.

memory:
  backend: json
  memory_dir: data/memory
  sqlite_path: data/smriti_memory.sqlite3
  redis_url: redis://localhost:6379/0
  postgres_dsn: ""
  autosave: true

security:
  encryption_key: ""

api:
  host: 0.0.0.0
  port: 8000
  cors_origins:
    - "*"

model:
  adapter: local_hf
  base_model_id: google/gemma-4-E2B-it
  hf_endpoint_url: ""
  ollama_url: http://localhost:11434
  vllm_url: http://localhost:8000
  openai_compatible_url: https://api.openai.com/v1

Environment variables override config values:

Variable	Purpose
`SMRITI_CONFIG_PATH`	Path to config file.
`SMRITI_MEMORY_BACKEND`	`json`, `sqlite`, `redis`, or `postgres`.
`SMRITI_MEMORY_DIR`	JSON memory directory.
`SMRITI_SQLITE_PATH`	SQLite database path.
`SMRITI_REDIS_URL`	Redis connection URL.
`SMRITI_POSTGRES_DSN`	Postgres DSN.
`SMRITI_AUTOSAVE`	Save memory after API updates.
`SMRITI_MEMORY_KEY`	Encrypt memory blobs.
`SMRITI_API_KEY`	Protect API routes.
`AUTH_ENABLED`	Enable role-bound API-key auth.
`SMRITI_API_KEYS_PATH`	Path to `api_keys.json` with user/admin keys.
`SMRITI_CORS_ORIGINS`	Comma-separated CORS allowlist.
`SMRITI_HOST`	API host.
`SMRITI_PORT`	API port.
`SMRITI_MODEL_ADAPTER`	`local_hf`, `hf_endpoint`, `ollama`, `vllm`, or `openai`.
`BASE_MODEL_ID`	Base model ID, default `google/gemma-4-E2B-it`.
`HF_ENDPOINT_URL`	Hugging Face Inference Endpoint URL when using endpoint mode.
`OPENAI_API_KEY`	Provider key only when using OpenAI-compatible adapter.

Durable Memory Backends

Backend	Best For	Notes
JSON	Local experiments, privacy-first single-machine usage.	Simple files under `data/memory`.
SQLite	Local production, desktop apps, edge devices.	Single-file database, no external service.
Redis	Low-latency state for deployed agents.	Good for concurrent API use; use persistence in production.
Postgres	Enterprise durability and operational tooling.	Good fit for audited multi-user deployments.

Examples:

SMRITI_MEMORY_BACKEND=json smriti-cli start-server
SMRITI_MEMORY_BACKEND=sqlite SMRITI_SQLITE_PATH=data/smriti.sqlite3 smriti-cli start-server
SMRITI_MEMORY_BACKEND=redis SMRITI_REDIS_URL=redis://localhost:6379/0 smriti-cli start-server
SMRITI_MEMORY_BACKEND=postgres SMRITI_POSTGRES_DSN=postgresql://host:5432/smriti smriti-cli start-server

Docker

Local CPU API

docker compose up -d --build api

API With Redis

COMPOSE_PROFILES=redis SMRITI_MEMORY_BACKEND=redis docker compose up -d --build

API With Postgres

COMPOSE_PROFILES=postgres SMRITI_MEMORY_BACKEND=postgres docker compose up -d --build

GPU-Capable Image

When Docker has NVIDIA runtime support:

SMRITI_DOCKERFILE=Dockerfile docker compose up -d --build api

Production Compose

docker compose -f docker-compose.prod.yml up -d

Monitoring Stack

COMPOSE_PROFILES=monitoring docker compose up -d --build

Then open:

Service	URL
API	`http://localhost:8000`
API docs	`http://localhost:8000/docs`
Prometheus	`http://localhost:9090`
Grafana	`http://localhost:3000`

Default Grafana credentials in local compose:

admin / smriti

Hugging Face Model-Style Deployment

Smriti AI can also be packaged as a Hugging Face model repository with a custom handler.py. This does not make Smriti AI a newly trained foundation model. It packages the memory wrapper, model card, endpoint config, example requests, and upload tooling so Hugging Face Inference Endpoints can serve a memory-augmented base model.

Deployment assets live in:

deploy/huggingface_model/
deploy/huggingface_dataset/
deploy/huggingface_space/

Local handler smoke test:

BASE_MODEL_ID=google/gemma-4-E2B-it \
HF_TOKEN=$HF_TOKEN \
SMRITI_MEMORY_BACKEND=json \
SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
python deploy/huggingface_model/test_handler_local.py

Upload to a Hugging Face model repo:

export HF_TOKEN=...
python deploy/huggingface_model/upload_model_repo.py \
  --repo-id luciferai-devil/smriti-ai \
  --private false

Official v1.0 Hugging Face targets:

Asset	Repo
Model wrapper	`luciferai-devil/smriti-ai`
Benchmark dataset	`luciferai-devil/smriti-ai-benchmarks`
CPU-safe demo Space	`luciferai-devil/smriti-ai-demo`

Upload sanitized benchmark artifacts:

python deploy/huggingface_dataset/upload_benchmark_dataset.py \
  --repo-id luciferai-devil/smriti-ai-benchmarks \
  --private false

Upload the public demo Space:

python deploy/huggingface_space/upload_space.py \
  --repo-id luciferai-devil/smriti-ai-demo \
  --private false

The Space runs in CPU-safe memory-only mode by default, warns users not to enter PII, and auto-deletes demo memory after inactivity.

Use BASE_MODEL_ID for a locally loaded model inside the endpoint, or HF_ENDPOINT_URL if the Smriti handler should call another model endpoint. Production endpoints should use external Redis/Postgres memory and must not store private user memory inside the model repository.

See docs/deploy_as_hf_model.md for the full deployment guide.

Monitoring And Observability

The API exports Prometheus metrics from /metrics.

Metric	Meaning
`smriti_http_requests_total`	Request count by method, path, status.
`smriti_http_errors_total`	Server-side error count.
`smriti_http_request_latency_seconds`	End-to-end request latency histogram.
`smriti_retrieval_latency_seconds`	Memory retrieval latency histogram.
`smriti_tokens_total`	Approximate token count observed by API.
`smriti_user_memories`	Number of loaded user memory stores.
`smriti_user_memory_bytes`	Approximate serialized memory size by user.

Observability helper:

python scripts/metrics_monitor.py --url http://localhost:8000 --output reports/metrics_report.md

Privacy And Security

Smriti AI stores user memory, so privacy is a core operational concern.

Requirement	Smriti AI Support
User isolation	Memory is keyed by `user_id` / `session_id` and `topic_id`.
Deletion	`/memory/delete` and `smriti-cli memory delete`.
Encryption	Set `SMRITI_MEMORY_KEY` to encrypt backend blobs.
API protection	Set `SMRITI_API_KEY` or `AUTH_ENABLED=true` with `api_keys.json`.
RBAC	`user` keys can access only their bound `user_id`; `admin` keys can operate across users.
Local-first deployment	JSON/SQLite can run fully on-device.
Auditability	Memory can be exported, inspected, pinned, archived, edited, and deleted.

Delete user memory through API:

curl -X POST http://localhost:8000/memory/delete \
  -H "Content-Type: application/json" \
  -d '{"user_id": "alex"}'

Framework Integrations

LangChain

from smriti.integrations.langchain import SmritiMemory

memory = SmritiMemory(session_id="alex", topic_id="profile")
memory.save_context(
    {"input": "My name is Alex and I work at Ocean Lab."},
    {"output": "Nice to meet you, Alex."},
)
print(memory.load_memory_variables({"input": "Where do I work?"}))

LlamaIndex

from smriti.integrations.llama_index import SmritiStorageContext

storage = SmritiStorageContext(session_id="alex", topic_id="profile")
storage.add_node("Alex is a marine biologist.")
print(storage.query("What does Alex do?"))

Web Demo

The demo app lets users inject facts, ask distractors, view retrieved memories, and delete user memory.

pip install -e ".[demo]"
uvicorn demo.app:app --port 8080

You can also run the packaged module directly:

python -m demo.app

Screenshots from the current dashboard:

Smriti AI dashboard home

Smriti AI benchmark evidence

See src/demo/README.md for details.

Memory Audit Dashboard

The audit dashboard is a separate authenticated UI for operators and privacy reviews:

export SMRITI_AUDIT_USER=admin
export SMRITI_AUDIT_PASSWORD="replace-with-a-strong-password"
uvicorn demo.audit_app:app --port 8090

Open http://127.0.0.1:8090 and sign in with the configured credentials. Use it to search, edit, pin, archive, and delete individual memories.

Model Provider Adapters

Smriti AI can wrap multiple generation providers through a small adapter interface:

from smriti.adapters import build_adapter

adapter = build_adapter("hf_endpoint")
text = adapter.generate("Augmented Smriti prompt", max_new_tokens=128)

Supported adapters:

Adapter	Typical target
`local_hf`	Local Transformers Gemma 4.
`hf_endpoint`	Hugging Face Inference Endpoint.
`ollama`	Local Ollama REST server.
`vllm`	vLLM server.
`openai`	OpenAI-compatible hosted APIs.

Benchmarks

Benchmark Policy

Public benchmark claims in this repository use real Gemma 4 only:

google/gemma-4-E2B-it

Deterministic test-double paths may exist for engineering tests, but they are not public model-quality claims.

Current Local Gemma 4 Results

These are current local CPU measurements from the checked-in CSV artifacts.

Evaluation	Baseline Recall	Best Smriti AI Recall	Absolute Lift	Notes
Gemma-style three-fact protocol	0/3	3/3	+3 facts	Baseline 5.71s, Semantic+Graph+Identity 4.99s avg CPU latency.
Five-mode comparison (`max_new_tokens=16`)	0/3	3/3	+3 facts	Fastest successful memory mode: Semantic+Graph at 2.78s avg CPU latency.
Original broader protocol rerun (`max_new_tokens=256`)	0/3	3/3	+3 facts	Overall average improved from 0.524 to 0.832 (`+58.9%`).

Five-mode comparison:

Configuration	Recall	Avg Latency	Context Coherence	Notes
Baseline	0/3	4.927s	0.000	Frozen Gemma 4, no memory layer.
TF-IDF	3/3	3.481s	0.667	Lexical memory mode.
Semantic	3/3	2.857s	0.333	Embedding-based memory mode.
Semantic + Graph	3/3	2.781s	0.667	Fastest successful memory mode in this CPU run.
Semantic + Graph + Identity	3/3	5.164s	0.000	Adds persona governance overhead.

Original broader protocol rerun:

Metric	Baseline	Smriti AI	Delta
Memory retention	0.000	1.000	+inf%
Response consistency	0.571	0.496	-13.2%
Context coherence	1.000	1.000	+0.0%
Overall average	0.524	0.832	+58.9%

The older +31.2% overall number from earlier writeups remains historical lineage. The current comparable broader-protocol rerun is +58.9% under this local Gemma 4 CPU setup with max_new_tokens=256.

Run Benchmarks

Install benchmark and ML extras:

pip install -e ".[ml,bench]"

Gemma-style memory retention:

python benchmarks/run_gemma_eval.py

Five-configuration comparison:

python benchmarks/run_benchmarks.py \
  --model-preset gemma4 \
  --configurations tfidf semantic semantic_graph semantic_graph_identity \
  --devices auto \
  --max-new-tokens 16 \
  --output benchmarks/results_comparison.csv

Original broader protocol rerun:

python benchmarks/run_historical_protocol.py --max-new-tokens 256

Cross-model harness:

python benchmarks/run_benchmarks.py \
  --model-preset cross_model \
  --output benchmarks/cross_model_results.csv \
  --summary-output benchmarks/summary.md

Long-memory / LoCoMo-style runner:

python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic

Identity drift benchmark:

python benchmarks/run_identity_bench.py --output reports/identity_evaluation.csv

Aggregate summaries:

python benchmarks/summarize_results.py

Benchmark Evidence Files

File	Purpose
`benchmarks/results_gemma_eval.csv`	Gemma 4 baseline vs Smriti three-fact evaluation.
`benchmarks/results_comparison.csv`	Baseline, TF-IDF, semantic, semantic+graph, semantic+graph+identity.
`benchmarks/results_historical_protocol.csv`	Current rerun of the older broader protocol.
`benchmarks/results_historical_protocol_responses.json`	Response audit trail for the broader-protocol rerun.
`benchmarks/cross_model_results.csv`	Optional cross-model memory-retention comparison.
`benchmarks/longmem_results.csv`	Optional LoCoMo-style long-memory output.
`benchmarks/latency_gemma4.csv`	Dedicated Gemma 4 latency/token probe.
`reports/identity_evaluation.csv`	Persona drift detection benchmark.
`results/summary.md`	Human-readable aggregate summary.
`benchmarks/README.md`	Generated benchmark table.
`model_card_smriti.md`	Model card and result disclaimer.
`research/evidence/benchmark_lineage.csv`	Historical/current result ledger and claim-status labels.

Testing

Run the full test suite:

pytest -q

Run the production hardening matrices:

make test              # unit + deterministic test-double integration
make test-security     # prompt injection, redaction, auth/RBAC, delete/encryption
make test-benchmarks   # deterministic benchmark artifacts and budgets
make production-gates  # manifest, regression, privacy, and gate report checks
make end-user-readiness # first-run install/docs/CLI/deployment readiness checks

These PR-safe tests use deterministic test-double paths. Gemma 4 and other real-model benchmarks are reserved for nightly/manual runs so ordinary contributors do not need to download large gated checkpoints.

Run with coverage:

pytest --cov=smriti --cov-report=term-missing --cov-report=html:reports/coverage/html

Run style checks:

ruff check benchmarks scripts src tests

Build and install the wheel locally:

python -m build
python -m venv .venv-wheel
source .venv-wheel/bin/activate
pip install dist/smriti_ai-*.whl
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('wheel OK')"

Smoke Tests

Local API smoke test:

bash scripts/smoke_test.sh

Latency probe:

python scripts/measure_latency.py --retrieval-modes tfidf semantic --output benchmarks/latency_results.csv

Load test helper:

python scripts/load_test_runner.py --users 10 --spawn-rate 10 --run-time 30s --backend json

See docs/load_testing.md for the 10/100/1000-user matrix and report files.

Fault-tolerance probe:

python scripts/fault_tolerance_tests.py --url http://localhost:8000

Agentic Harness Evolution

Smriti AI now includes an AHE-inspired loop for improving the inference-time memory harness while keeping Gemma 4 or any other base model frozen.

Layer	File	Purpose
Harness config	`configs/harness_params.yaml`	Editable retrieval, graph, compression, and identity-governance parameters.
Evidence collection	`benchmarks/collect_evidence.py`	Runs memory-retention or JSON/JSONL long-memory tasks and writes summary/log evidence.
Evolution decision	`evolve_harness.py`	Applies bounded heuristics and appends a predicted-impact manifest entry.
Closed loop	`run_evolution.py`	Re-evaluates proposed configs, reverts regressions, and can tag Git iterations.
Audit trail	`manifests/evolve_manifest.jsonl`	JSONL history of component changed, previous/new values, reason, prediction, observed effect, and config snapshots.
Harness registry	`harnesses/`	Versioned seed/evolved harness artifacts with metadata, results, and production status.
Manifest verifier	`src/smriti/manifest_verifier.py`	Validates that each accepted/rejected change has before/after evidence.
Production gates	`src/smriti/production_gates.py`	Runs tests, backend/privacy checks, validation, holdout, cross-model, latency, token, and identity gates before promotion.
Canary routing	`src/smriti/canary.py`	Sticky per-user canary routing for evolved harnesses with rollback conditions.

Quick local loop:

python benchmarks/collect_evidence.py \
  --config configs/harness_params.yaml \
  --summary benchmarks/evidence_summary.json

python evolve_harness.py \
  --config configs/harness_params.yaml \
  --evidence benchmarks/evidence_summary.json

python run_evolution.py --iterations 5 --no-commit

Validation and release-gate loop:

python benchmarks/validate_harness_evolution.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_holdout_eval.py \
  --config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_cross_model_harness_eval.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python harness/verify_manifest.py
python harness/production_gates.py evolved-v1 --to candidate

Harness registry CLI:

smriti-cli harness list
smriti-cli harness show evolved-v1
smriti-cli harness compare seed evolved-v1
smriti-cli harness activate evolved-v1
smriti-cli harness rollback seed
smriti-cli harness verify-manifest
smriti-cli harness promote evolved-v1 --to production
smriti-cli harness regression-test

API/dashboard support:

Endpoint	Purpose
`GET /harness/current`	Show active harness parameters and registry entries.
`GET /harness/history`	Return manifest history.
`GET /harness/metrics`	Return validation and canary metrics.
`POST /harness/rollback`	Roll back to a registry harness. Admin-only when auth is enabled.
`POST /harness/evaluate`	Run seed-vs-evolved validation. Admin-only when auth is enabled.
`GET /harness/canary/status`	Show active/canary routing status.
`POST /harness/canary/start`	Start sticky canary routing. Admin-only when auth is enabled.
`POST /harness/canary/stop`	Stop canary routing.
`POST /harness/canary/promote`	Promote canary harness.
`POST /harness/canary/rollback`	Roll back canary harness.

The web dashboard also exposes a harness cockpit with current parameters, recent manifest entries, manual overrides, rollback controls, quick evaluation, seed comparison, and report export. See docs/research_lineage.md for the research rationale and AHE mapping.

Generated harness artifacts:

Artifact	Purpose
`results/harness_evolution_validation.md`	Baseline vs seed vs evolved harness validation.
`results/evolution_generalization_report.md`	Final holdout evaluation.
`results/cross_model_harness_eval.md`	Cross-model deterministic harness validation.
`results/manifest_verification.md`	Manifest integrity report.
`results/production_gate_report.md`	Promotion gate verdict.
`results/canary_report.md`	Canary routing status and metrics.
`reports/evolution_report.md`	Stakeholder-readable evolution report.

GPU And CPU Behavior

Smriti AI is designed to fall back cleanly to CPU.

Component	CPU	GPU
Base model	Works, slower for Gemma 4.	Moves model to CUDA when available.
Generation dtype	`float32` on CPU.	`bfloat16` if supported, else `float16`.
Embeddings	Sentence-transformers can run on CPU.	Embedding model can move to CUDA.
FAISS	Uses CPU by default.	Attempts GPU indices when CUDA FAISS support is available.

For practical demos with Gemma 4, GPU is recommended. CPU is acceptable for reproducibility but slower.

Training Research Package

Runtime Smriti AI is training-free. The training package is separate and optional.

pip install -e ".[training]"
python -m training.ewc_replay --model google/gemma-4-E2B-it --dataset path/to/data.jsonl --dry-run

The training module includes replay/EWC experiment scaffolding and logs metrics under training/. It is not imported by smriti during inference.

CI/CD

Workflow	Trigger	Purpose
`.github/workflows/ci.yml`	Push / PR	Install, lint, test, compile, build, install wheel, audit, upload artifacts.
`.github/workflows/test_agent_hardening.yml`	Push / PR/manual	Unit, test-double integration, OWASP-style security, benchmark smoke, production gates, optional backend jobs.
`.github/workflows/nightly_benchmarks.yml`	Nightly/manual	Real-model Gemma-style, retrieval, holdout, and identity benchmarks.
`.github/workflows/harness_production_gate.yml`	Harness/API/benchmark changes	Verify manifest and run production gates for evolved harnesses.
`.github/workflows/benchmark.yml`	Nightly/manual	Run benchmark suite on a small configured setup.
`.github/workflows/load-test.yml`	Push/nightly/manual	Run a 10-user API load smoke test and upload reports.
`.github/workflows/docker.yml`	Tags/manual	Build API/demo/training Docker images.
`.github/workflows/release.yml`	Tag push	Build package and publish release artifacts.

Latest local push for the harness-production work passed CI, Harness Production Gate, and Load Test workflows.

Repository Layout

smriti-ai/
|-- src/smriti/                  # Runtime memory package
|-- src/training/                # Optional replay/EWC research code
|-- src/demo/                    # Small web demo
|-- demos/                       # Kaggle/Colab package-import notebooks
|-- benchmarks/                  # Gemma 4 evaluations and CSV results
|-- configs/                     # Harness and runtime parameter files
|-- harness/                     # Manifest verification and production-gate wrappers
|-- harnesses/                   # Versioned seed/evolved harness registry
|-- tests/                       # Unit and integration tests
|-- scripts/                     # Setup, smoke, latency, load, fault probes
|-- docs/                        # Privacy and API documentation
|-- manifests/                   # AHE JSONL evolution audit trail
|-- research/artifacts/          # Curated original notebooks, logs, and excerpts
|-- research/evidence/           # Curated benchmark lineage and evidence policy
|-- monitoring/                  # Prometheus/Grafana assets
|-- support/                     # Troubleshooting and sample configs
|-- notebooks/                   # Package-based demo notebook
|-- reports/                     # Readiness, coverage, metrics reports
|-- Dockerfile                   # GPU-capable API image
|-- Dockerfile.cpu               # Lightweight CPU API image
|-- Dockerfile.demo              # Demo image
|-- Dockerfile.training          # Training/research image
|-- docker-compose.yml           # Local API/backends/monitoring stack
|-- docker-compose.prod.yml      # Production-oriented compose stack
|-- pyproject.toml               # Package metadata and extras
|-- config.yaml                  # Local config template
|-- evolve_harness.py            # One-step harness evolution proposal script
|-- run_evolution.py             # Closed-loop evidence/evolve/verify driver
|-- ROADMAP.md                   # Post-v1 roadmap including AHE hardening
|-- model_card_smriti.md         # Model card and benchmark disclosure

Production Readiness Notes

Area	Recommendation
Auth	Set `SMRITI_API_KEY` or place API behind an authenticated gateway.
RBAC	Use `AUTH_ENABLED=true` and role-bound keys for production endpoints.
Secrets	Use environment variables or a secret manager, not committed config files.
Storage	Use SQLite for local apps, Redis/Postgres for server deployments.
Encryption	Set `SMRITI_MEMORY_KEY` for sensitive memory.
Backups	Back up JSON/SQLite/Postgres memory stores according to your RPO/RTO.
Observability	Scrape `/metrics` and use Grafana dashboard panels.
Load testing	Run Locust/wrk-style load tests before enterprise rollout.
Deletion	Wire `/memory/delete` into user data deletion workflows.
Audit	Protect `demo.audit_app` and audit endpoints before exposing memory inspection.
Benchmarking	Rerun Gemma 4 benchmarks on your hardware before publishing claims.

See reports/production_readiness.md for the latest QA snapshot.

Troubleshooting

Problem	Likely Cause	Fix
`ModuleNotFoundError: smriti`	Package not installed in current environment.	Run `pip install -e .` or activate the correct venv.
`ModuleNotFoundError: transformers`	ML extras not installed.	Run `pip install -e ".[ml]"`.
Gemma 4 fails to load	Missing Hugging Face access or incompatible Transformers stack.	Run `hf auth login` and update ML dependencies.
API returns memory-only text	No model agent factory is registered.	Use `SmritiAILite` in Python or deploy API with an agent factory.
API returns `401`	`SMRITI_API_KEY` is set.	Send `x-api-key: <key>`.
Memory not persisted	Autosave disabled or backend not configured.	Set `SMRITI_AUTOSAVE=1` and configure backend.
Encrypted memory cannot load	Missing or wrong `SMRITI_MEMORY_KEY`.	Set the same key used when saving.
Docker image is large	ML dependencies and model runtimes are heavy.	Use `Dockerfile.cpu` for API-only memory service.
Benchmarks are slow	Gemma 4 on CPU is heavy.	Use GPU or reduce `max_new_tokens` for local checks.

Roadmap

Release	Theme	Planned Work
v1.1	Memory quality	Hot/cold memory tiers, stronger compression, multilingual embeddings, cross-lingual recall tests, configurable decay/top-K/summarization thresholds.
v1.2	Scalability	Async backend paths, `asyncpg` Postgres option, batched writes, embedding cache, 100/500/1000-user load reports.
v1.3	Research	LongMemEval/MemoryBench tracking, SmritiBench design, temporal/weighted graph memory, cross-agent shared memory with strict isolation.
Ongoing	Community	Good-first issues, backend/adapter contribution guides, benchmark reproducibility reports, pilot-user feedback loops.

See ROADMAP.md for the living post-v1 roadmap.

Contributing

See CONTRIBUTING.md for development setup and contribution guidance.

Recommended local loop:

pip install -e ".[dev,bench]"
ruff check src benchmarks scripts tests
pytest -q
python -m build

Release history and stakeholder-facing notes live in CHANGELOG.md and RELEASE_NOTES_v1.0.5.md. A tutorial draft for the v1 memory protocol, audit UI, and benchmark evidence lives in docs/blog/smriti-ai-v1-memory-layer.md.

License

Apache-2.0. See pyproject.toml for package metadata.

Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

System	Recall	Precision@K	p95 latency ms	Token overhead	Privacy delete
baseline_frozen_model	0.000	0.000	0.000	0	True
smriti_seed_harness	1.000	0.333	0.525	328	True
smriti_evolved_harness	1.000	0.333	0.168	328	True

Cross-model harness validation:

Model	Seed recall	Evolved recall	Gate
google/gemma-4-E2B-it	1.000	1.000	pass
meta-llama/Llama-3.2-1B	1.000	1.000	pass
microsoft/Phi-3-mini-4k-instruct	1.000	1.000	pass
mistralai/Mistral-7B-Instruct-v0.3	1.000	1.000	pass
Qwen/Qwen2.5-1.5B-Instruct	1.000	1.000	pass

Production gate report: results/production_gate_report.md

Deterministic test doubles are used only for CI stability and never counted as public benchmark evidence. By - Soumyajit Ghosh

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.8

May 7, 2026

1.0.7

May 6, 2026

1.0.6

May 6, 2026

This version

1.0.5

May 6, 2026

1.0.3

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smriti_memory_ai-1.0.5.tar.gz (192.3 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smriti_memory_ai-1.0.5-py3-none-any.whl (177.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file smriti_memory_ai-1.0.5.tar.gz.

File metadata

Download URL: smriti_memory_ai-1.0.5.tar.gz
Upload date: May 6, 2026
Size: 192.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smriti_memory_ai-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`fbb52ac8da5fdcea5a50661b8dd2a06c19527b045f115d9be9343dcf79e42cb7`
MD5	`7439ceecc677371dfcac070394b3e632`
BLAKE2b-256	`bfb7467204bb7f22b261a96070a98869596d111da8f7e0e9ce5cfd99a1104935`

See more details on using hashes here.

File details

Details for the file smriti_memory_ai-1.0.5-py3-none-any.whl.

File metadata

Download URL: smriti_memory_ai-1.0.5-py3-none-any.whl
Upload date: May 6, 2026
Size: 177.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smriti_memory_ai-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a28883ce5ad87d77c17568a3aab587f4fc3b060d0e08f54f3ce704c372716317`
MD5	`7e6c1c6353ae0119688cd4df9a49569d`
BLAKE2b-256	`82b69969ada8cc8fc38026c5fadfa487309a3a7075e48ae07695e2467f37c246`

See more details on using hashes here.

smriti-memory-ai 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Smriti AI

Current Status

What Smriti AI Is

What Smriti AI Is Not

Key Features

Research Lineage And Principles

Architecture

Core Modules

Installation

Requirements

Install From GitHub

Install From PyPI

One-Shot Installer

Model And API Keys

Quick Start For Customers

Enterprise AI Teams

Indie Developers And Personal Assistants

Researchers And Startups

Privacy-Sensitive Deployments

Quick Start: Python Library

Memory-Only Usage

Full Gemma 4 Usage

Save And Load Memory

Direct Semantic Memory

Direct Knowledge Graph

Quick Start: CLI

Quick Start: FastAPI Service

Important API Runtime Note

API Endpoints

Configuration

Durable Memory Backends

Docker

Local CPU API

API With Redis

API With Postgres

GPU-Capable Image

Production Compose

Monitoring Stack

Hugging Face Model-Style Deployment

Monitoring And Observability

Privacy And Security

Framework Integrations

LangChain

LlamaIndex

Web Demo

Memory Audit Dashboard

Model Provider Adapters

Benchmarks

Benchmark Policy

Current Local Gemma 4 Results

Run Benchmarks

Benchmark Evidence Files

Testing

Smoke Tests

Agentic Harness Evolution

GPU And CPU Behavior

Training Research Package

CI/CD

Repository Layout

Production Readiness Notes

Troubleshooting

Roadmap

Contributing

License

Harness Evolution Results

Project details

Verified details

Maintainers

Unverified details

Project links

Meta