Local-first agent-memory layer with hybrid retrieval (BM25 + cosine). Drop-in for vector-store + RAG, benchmarked to beat vector DBs on QA accuracy. Portable as a single directory. LLM-agnostic.
Reason this release was yanked:
Superseded by 0.2.0rc6 — ships real fixes for correctness (bundle reload under wrong embedder), packaging (CLI subcommands failed after pip install), plus POST /auth/revoke HTTP route and doc example fixes.
Project description
SOMA
Local-first agent-memory layer with hybrid retrieval (BM25 + cosine). Drop-in for vector-store + RAG, benchmarked to beat vector DBs on QA accuracy. Store text, retrieve by meaning and keywords, reconcile conversational facts into durable memory. Portable as a single directory. LLM-agnostic.
M1 — Hybrid retrieval validated (2026-04-20): +22.8 % F1 and +15.6 % rank-1 over a Chroma-cosine baseline on LongMemEval N=500, same embedder, same LLM, matched context budgets. Triple-cross-validated (direct Token-F1 +22.8 %, qwen-as-judge +22.2 %, Claude-as-judge +23.7 %) and reproducible across six axes — cross-LLM (qwen9b + Claude), cross-embedder, cross-benchmark, cross-judge, α-sweep. Milestone doc:
docs/milestones/2026-04-20-hybrid-retrieval-validated.md. Evidence:research/developmental/results/longmemeval_full_evidence_roundup.md.
60-second tour: install, store a fact, retrieve it — see the Quick start below or the full end-to-end flow in
docs/quickstart.md. Picking SOMA over Mem0/Letta/Zep/Chroma?docs/comparison.md. Patterns + recipes:docs/cookbook.md. Positioning:docs/positioning.md.
Install
# Default (quality retrieval — what the Quick start below uses):
pip install "soma-memory[sbert]"
# REST API server + JWT auth:
pip install "soma-memory[serve]"
# FAISS ANN (>10K entries):
pip install "soma-memory[ann]"
# Prometheus /metrics + OpenTelemetry tracing:
pip install "soma-memory[metrics]"
pip install "soma-memory[otel]"
# Alternative vector backends:
pip install "soma-memory[qdrant]" # Qdrant (local file or HTTP)
pip install "soma-memory[lancedb]" # embedded arrow-native (10M+ scale)
pip install "soma-memory[chroma]" # drop-in for existing Chroma users
pip install "soma-memory[pgvector]" # Postgres + pgvector
# Cloud object-store bundles:
pip install "soma-memory[s3]" # s3:// URLs on save/load
pip install "soma-memory[gcs]" # gs:// URLs on save/load
# Framework adapters:
pip install "soma-memory[langchain]"
pip install "soma-memory[llamaindex]"
# Everything runtime-useful:
pip install "soma-memory[sbert,ann,serve,metrics,otel,qdrant,lancedb,chroma,pgvector,s3,gcs,langchain,llamaindex]"
# Absolute minimum (no sbert — you must pass your own embed_fn):
pip install "soma-memory"
Developing on SOMA? Clone the repo and use the editable variant with the same set of
[extras]:pip install -e ".[sbert]",pip install -e ".[sbert,serve,metrics]", etc. SeeCONTRIBUTING.md.
Quick start
from soma.memory import MemoryLayer
mem = MemoryLayer.with_sbert() # all-MiniLM-L6-v2
mem.store("user lives in Portland, OR", metadata={"user": "alex"})
mem.store("user is vegetarian", metadata={"user": "alex"})
mem.store("user's dog is named Luna", metadata={"user": "alex"})
hits = mem.retrieve("dietary restrictions", k=3, where={"user": "alex"})
mem.save("my-brain/") # portable bundle
mem = MemoryLayer.load("my-brain/") # resume anywhere
For the end-to-end agent flow — soma serve, JWT issue + revoke, ConversationalMemory fact extraction, multi-user scoping, Grafana dashboard import — see docs/quickstart.md.
Runnable examples
Self-contained scripts under examples/ that exercise the core API end-to-end:
01_quickstart.py— the 10-line Python API tour (store, retrieve, save/load round-trip with metadata filters).02_persistent_chat_agent.py— chat agent whose memory survives process restarts. Stub LLM inline; hook your own with ~5 lines.03_multi_tenant_bundle.py— one process, many isolated per-tenant bundles, with a cross-tenant-leak check.cloud_s3_demo.py— round-trip a bundle throughs3://object storage.
Run any of them from a clone with python examples/<name>.py after pip install -e ".[sbert]".
How it compares
| Capability | Chroma | Mem0 / Zep | Pinecone | SOMA |
|---|---|---|---|---|
| Vector retrieval | yes | yes | yes | yes |
| Local-first, zero cloud deps | yes | partial | no | yes |
Metadata where filter at retrieve |
yes | yes | yes | yes |
| Hybrid BM25 + vector (built-in) | no | partial | partial | yes |
| Cross-encoder rerank (built-in) | no | no | partial | yes |
| LLM query expansion (built-in) | no | partial | no | yes |
| Conversational extract + reconcile (built-in) | no | yes | no | yes |
| Multi-user scoping on a shared bundle | no | partial | no | yes |
| Plug-and-play LLM backends | no | partial | no | yes (5 shipped) |
| Plastic graph substrate (research only — see Scope) | no | no | no | yes |
| Single-directory brain portability | partial | no | no | yes |
Multi-tenant REST (bundles/{name}) |
no | yes | yes | yes |
| Per-bundle JWT auth + revocation blocklist | no | partial | yes | yes |
| Crash-safe WAL + auto-compaction | partial | yes | yes | yes |
| Prometheus metrics + importable Grafana dashboards | no | no | partial | yes |
| Pluggable vector backends (adapter protocol) | no | no | no | yes (InProc + Qdrant + LanceDB + Chroma + pgvector) |
| Bundles on S3 / GCS (scale-to-zero ready) | no | no | no | yes (s3:// / gs:// URLs) |
| GDPR-grade forgetting with audit trail | no | no | no | yes (POST /forget + docs/gdpr.md) |
| Typed schemas (31 built-in, extensible) | no | no | no | yes (8 domains, context packer) |
Full comparison + migration notes: docs/comparison.md.
Scope
SOMA ships two things in the same repo; the product and the research substrate are separable.
The product (what pip install soma-memory gets you): a local-first agent-memory layer with hybrid BM25 + cosine retrieval, multi-tenant REST, per-bundle JWT auth, pluggable vector backends (InProc / Qdrant / LanceDB / Chroma / pgvector), and crash-safe WAL. This lives under src/soma/memory/, src/soma/llm/, src/soma/cli.py, src/soma/serve.py, and src/soma/integrations/. Every benchmark number on this page measures this surface.
The research substrate (ships with the same package, but is not part of the memory-layer API): a plastic-graph / growth / pruning / consolidation pipeline under src/soma/core/, src/soma/growth/, src/soma/metacognition/, src/soma/consolidation/, src/soma/io/, src/soma/deploy/. It runs end-to-end, but three serious attempts to route its learning signal into retrieval (plastic-graph activation, Direction 4a LLM-distilled projections, Direction 4b spatial distillation) are null on real corpora — see M1 milestone "What's ruled out". We keep it in-tree as the measurement substrate for Path A / Path B research (docs/plans/2026-04-20-path-a-biophysical-representation-layer-design.md, docs/plans/2026-04-20-path-b-bio-validated-primitives-design.md), not because it currently improves the product.
If you're evaluating SOMA as a vector-DB / RAG replacement, the product is what matters. If you're interested in the research agenda, docs/milestones/2026-04-20-hybrid-retrieval-validated.md is the starting point and docs/plans/ has the full trail. Contributor-facing split: CONTRIBUTING.md §Scope.
Benchmark (same sbert embedder, measured vs Chroma, reports in benchmarks/reports/):
- Quality parity: identical Recall@3 / MRR@3 / NDCG@3 at same embedder (by construction).
- Disk: 22.6× smaller at 50 facts, narrowing to 1.4× at 20K and 1.42× at 100K.
- Store (full pipeline 1K–20K): 3.2–3.6× faster per op; index-only 100K ingest takes 0.4 s vs Chroma's 23.6 min because SOMA's store is a tensor append while Chroma pays ~14 ms/op for SQLite+HNSW metadata (
scale_enterprise_100k.md). - Retrieve HNSW backend: 1.18–1.25× faster at 1K–20K, growing to 5.12× at 100K while preserving identical recall.
- Drift: 30-day simulation, old-fact Recall@3 = 0.883 ≈ recent 0.938 (memory doesn't rot).
Recall boosters — SOMA goes beyond the same-embedder ceiling:
Peer vector DBs all tie SOMA on recall when using the same embedder (identical cosine over identical vectors). To beat them, SOMA ships three opt-in boosters:
| Retrieval strategy | R@1 | R@5 | Lift R@5 vs cosine |
|---|---|---|---|
| Pure cosine (peer DB ceiling) | 0.098 | 0.238 | — |
| Hybrid BM25+cosine | 0.207 | 0.415 | +17.7 pp (+74%) |
| Cross-encoder rerank | 0.203 | 0.309 | +7.1 pp |
| Hybrid + rerank | 0.287 | 0.450 | +21.2 pp (+89%) |
Measured on LoCoMo (5,882 turns, 1,982 questions). Both knobs on triples R@1 and adds ~34 ms on top of baseline 13 ms. Full suite lives under benchmarks/reports/ with the paper-draft.md aggregator wiring every number back to its script + report.
REST API + Docker
# Local:
soma serve --port 8420
# Docker:
docker compose up
Endpoints: /health, /version, /status, /store, /store_batch, /retrieve, /get/{id}, /related/{id}, /recent, /forget, /consolidate, /save, /snapshot, /auth/refresh, plus /bundles/{name}/... multi-tenant variants under per-bundle JWT auth. Full route reference, auth requirements, and request/response shapes: docs/rest-api.md.
Auth (pip install "soma-memory[serve]"): per-bundle JWTs with read/write/admin scopes, HS256 or RS256, rotation via soma auth rotate-secret, single-token revocation via a file-backed blocklist (SOMA_JWT_BLOCKLIST_PATH). Full reference: docs/auth.md.
Observability (pip install "soma-memory[metrics]"): GET /metrics exposes 18+ Prometheus counters/gauges/histograms covering every MemoryLayer hot path plus per-route HTTP timings. Three importable Grafana dashboards ship under deploy/grafana/ (RED overview, auth, USE bundle-health). Set SOMA_LOG_JSON=1 for Loki/Datadog-ready structured logs. OpenTelemetry spans via [otel] + SOMA_OTEL_ENABLED=1. Metric reference: docs/observability.md.
TypeScript client
npm install soma-memory
import { createClient } from "soma-memory";
const soma = createClient({ baseUrl: "http://localhost:8420", token: process.env.SOMA_TOKEN });
await soma.POST("/store", { body: { text: "Paris is the capital of France." } });
const { data } = await soma.POST("/retrieve", { body: { query: "capital?", k: 3 } });
Works in Node 18+, browsers, Deno, Bun, Cloudflare Workers. Types regenerate from the live /openapi.json on every PR — see docs/clients.md.
CLI
soma index --wiki path/to/docs --bundle my-brain/ # ingest folder
soma chat --bundle my-brain/ # auto-picks LLM backend
soma stats --bundle my-brain/ # entry count, disk
soma search --bundle my-brain/ --query "..." # vector search, no LLM
soma serve --port 8420 # REST API
soma bundle list ./data/bundles # lifecycle: list | info | delete
soma auth issue --sub alex --bundle alex:read,write --expires 30d
soma auth revoke --token $LEAKED --reason "leaked on slack"
soma chat auto-detects a backend: Ollama if running, OpenAI/Anthropic if OPENAI_API_KEY/ANTHROPIC_API_KEY is set, otherwise local HuggingFace. Override with --backend. See docs/llm-backends.md.
Cloud deploy
Fly.io: fly launch --from https://github.com/danthi123/soma --copy-config. Kubernetes (Helm 3.14+): a Helm chart ships under deploy/helm/ — helm install soma deploy/helm/soma — runbook in docs/deployment-k8s.md. Per-platform notes: docs/deployment-cloud.md. Minimum tier: 2 GB RAM.
Development
pip install -e ".[dev]"
pytest tests/ -v
ruff check src/ tests/
mypy src/soma/
Docs
- Quickstart — end-to-end agent-memory flow (install → serve → JWT → ConversationalMemory → Grafana).
- Comparison — SOMA vs Chroma / Mem0 / Letta / Zep / Pinecone.
- Cookbook — recipes for hybrid retrieval, rerank, multi-tenant REST, ConversationalMemory (sync/async/batch), multi-user, migrations, streaming chat, cloud bundles, typed schemas, context packing.
- Typed Schemas — define, store, retrieve, extend, and pack typed memory entries (31 built-in schemas across 8 domains).
- Auth — per-bundle JWTs, RS256 split, revocation, rotation.
- Observability — Prometheus metrics, JSON logs, OTel, Grafana dashboards.
- Backends — InProc / Qdrant / LanceDB / Chroma / pgvector adapter tradeoffs.
- Cloud — S3/GCS bundle URLs + Lambda / Cloud Run / Fly deploy recipes.
- GDPR forgetting —
POST /forget, audit trail, summary cascade, compliance posture. - LLM backends — Ollama / OpenAI / Anthropic / vLLM / HF.
- Recall improvements — hybrid BM25, rerank, query expansion research agenda.
- Clients — TypeScript client, auth modes, retry middleware.
- Demos — every shipped demo, when to run it.
- Positioning · Pivot + roadmap · Whitepaper · Paper draft
Full hierarchy: docs/README.md.
Status & known limitations
Release status: Development Status :: 4 - Beta (pyproject.toml). Latest on PyPI: soma-memory==0.2.0rc4 (2026-04-20).
Production-ready surface:
MemoryLayerAPI:store/retrieve/save/load/ hybrid BM25+cosine / cross-encoder rerank / context packing / GDPR forget.- REST API under
soma serve(full route reference). - Per-bundle JWT auth with HS256 / RS256 + file-backed or Redis revocation blocklist.
- Vector backends: InProc (default), LanceDB, Qdrant, Chroma, pgvector (pgvector is newer — see Experimental below).
- Bundle storage: local filesystem, S3, GCS (and S3-compat: MinIO, Cloudflare R2, DigitalOcean Spaces).
- Typed schemas (31 built-in, extensible) + context packer for LLM prompts.
Known limitations today:
- Single-writer WAL. Multi-process on one bundle works (peer-reload every 30 s); the Helm chart locks
replicaCount: 1. Horizontal scale lands when the external-backend path replaces the in-process WAL as source of truth. - Helm chart auth is
SOMA_API_KEYonly. The server supports both JWT and the legacy key; the chart template hasn't been updated for JWT yet. Raw-Deployment k8s users can wireSOMA_JWT_SECRETtoday — seedocs/auth.md+docs/deployment-k8s.mdfor the current chart scope. - Helm chart OCI registry is not yet published. Install from the in-tree chart:
helm install soma deploy/helm/soma. Future-state OCI one-liner documented indocs/deployment-k8s.md. ghcr.io/soma-ai/somacontainer image is not published. Operators who use the Helm chart need to build + push their own image from the in-tree Dockerfile and overrideimage.repository. The Dockerfile itself is production-ready.- No distributed multi-node vector scale. For >10M vectors, use
QdrantBackend(mode="http"); SOMA orchestrates and Qdrant scales horizontally. - No hosted / managed offering. SOMA is self-host only — see
docs/plans/2026-04-15-memory-layer-pivot.mdfor rationale.
Research substrate (not part of the shipped product — see Scope above): the plastic-graph / growth / pruning pipeline runs end-to-end but does not currently lift retrieval on any tested corpus. Three closed-out experiments documented in the M1 milestone "What's ruled out". Path A / Path B design docs lay out the next experiments.
Experimental / newer surface (shipped but less battle-tested at scale):
ConversationalMemoryasync extraction mode (extraction_mode="async").- In-process rate limiter (
SOMA_RATE_LIMIT_RPS) — for dev / homelab, not a WAF. PgvectorBackend— passes the in-tree testcontainers suite againstpgvector/pgvector:pg16, but hasn't seen production traffic yet.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soma_memory-0.2.0rc5.tar.gz.
File metadata
- Download URL: soma_memory-0.2.0rc5.tar.gz
- Upload date:
- Size: 380.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83b16f664f277c208afda5d849a79aee0c2ef0f046c9c83bc958502f9e13ccf1
|
|
| MD5 |
6bc9046e718f3be9b8b28b7a118a133c
|
|
| BLAKE2b-256 |
8530eddc7529c08914ac9a495348bf6eff9ca4c3ff1cc81fabd4fd0b7a840044
|
Provenance
The following attestation bundles were made for soma_memory-0.2.0rc5.tar.gz:
Publisher:
release.yml on danthi123/soma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soma_memory-0.2.0rc5.tar.gz -
Subject digest:
83b16f664f277c208afda5d849a79aee0c2ef0f046c9c83bc958502f9e13ccf1 - Sigstore transparency entry: 1359300848
- Sigstore integration time:
-
Permalink:
danthi123/soma@aaac8c4f6606b1417891db341dc2aecbf80887ac -
Branch / Tag:
refs/tags/v0.2.0rc5 - Owner: https://github.com/danthi123
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@aaac8c4f6606b1417891db341dc2aecbf80887ac -
Trigger Event:
push
-
Statement type:
File details
Details for the file soma_memory-0.2.0rc5-py3-none-any.whl.
File metadata
- Download URL: soma_memory-0.2.0rc5-py3-none-any.whl
- Upload date:
- Size: 420.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
523cb09b72b9aea96944bf7984325e3fca484e0125fcd9373fcce8ad374cd583
|
|
| MD5 |
dee471bddec6ecba0b09eae469941c0d
|
|
| BLAKE2b-256 |
2b2b5a90c59a0a38b82c0914636c041114558d77ce17e41c1a7ab0cd4d11cc06
|
Provenance
The following attestation bundles were made for soma_memory-0.2.0rc5-py3-none-any.whl:
Publisher:
release.yml on danthi123/soma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soma_memory-0.2.0rc5-py3-none-any.whl -
Subject digest:
523cb09b72b9aea96944bf7984325e3fca484e0125fcd9373fcce8ad374cd583 - Sigstore transparency entry: 1359300885
- Sigstore integration time:
-
Permalink:
danthi123/soma@aaac8c4f6606b1417891db341dc2aecbf80887ac -
Branch / Tag:
refs/tags/v0.2.0rc5 - Owner: https://github.com/danthi123
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@aaac8c4f6606b1417891db341dc2aecbf80887ac -
Trigger Event:
push
-
Statement type: