Skip to main content

Semantic memory for LLM agent calls with an equivalence-first cache architecture.

Project description

SmartMemo

SmartMemo is a semantic memory and caching layer for LLM agent calls. Its core thesis is simple: cosine similarity is a useful candidate selector, but it is not semantic equivalence. SmartMemo uses embedding search to find likely cache candidates, then uses a learned equivalence classifier to decide whether a cached response is safe to reuse.

As of 0.1.0, SmartMemo ships a pretrained classifier, so that decision works out of the box — no training required.

  • async SmartMemo.get_or_call(...)
  • a bundled pretrained equivalence classifier (classifier-v1), opt-in with one line
  • SQLite persistence
  • embedding provider protocol with SentenceTransformers embeddings and FAISS vector search
  • a reproducible local-LLM training-data pipeline and a hand-curated gold test set
  • classifier training, evaluation, checkpoint inference, and classifier-gated cache hits
  • durable feedback export and manual feedback-driven retraining with validation gates

Without a classifier, SmartMemo decides cache hits with a cosine threshold — the measured baseline. With the bundled classifier, cosine search becomes the candidate selector and the learned classifier makes the final cache-hit decision.

Install

SmartMemo's embedding and classifier stack depends on PyTorch, FAISS, and SentenceTransformers, so install the ml extra:

pip install "smartmemo[ml]"

For local development:

uv sync --all-extras
uv run pytest
uv run ruff check
uv run pyright

Minimal Example

from smartmemo import ClassifierConfig, SmartMemo

cache = SmartMemo(
    domain="customer-support",
    classifier=ClassifierConfig.bundled(),
)

async def call_llm(prompt: str) -> str:
    return "fresh LLM response"

result = await cache.get_or_call(
    prompt="Summarize this customer's latest billing ticket",
    llm_function=call_llm,
)

print(result.response)
print(result.was_cache_hit)
print(result.classifier_score)

The Bundled Classifier

classifier-v1 is a generic, cross-domain equivalence classifier shipped inside the package at smartmemo/_models/classifier-v1.pt. It is a small MLP over all-MiniLM-L6-v2 embeddings, trained on ~8,800 labeled prompt pairs built by a local LLM paraphraser (positives) and templated same-object/opposite-action swaps (hard negatives). The whole pipeline is scripts/generate_training_data.py.

Measured on a hand-curated gold set of 84 prompt pairs (31 equivalent, 53 not):

Decision method Precision Recall F1
Cosine baseline (at equal recall) 0.53 0.90 0.67
classifier-v1 (threshold 0.95) 0.85 0.90 0.88

That is +32 precision points at equal recall: on this gold set the cosine baseline makes 25 false-positive cache hits where classifier-v1 makes 5. The full, auditable model card is smartmemo/_models/classifier-v1.report.json.

classifier-v1 is a cold-start model. It is bound to the all-MiniLM-L6-v2 embedding space (384 dimensions), and per-domain accuracy improves with the feedback-driven retraining loop below.

Benchmarks

uv run python benchmarks/cosine_baseline_customer_support.py
uv run python benchmarks/classifier_vs_cosine.py

The first benchmark shows the cosine baseline's false-positive failure mode on customer-support prompts. The second scores the bundled classifier against the cosine baseline on the gold set and writes benchmarks/results/classifier_vs_cosine.json.

Training Your Own Classifier

SmartMemo includes a trainable pair classifier over prompt embeddings. To reproduce the shipped model from the committed dataset:

uv run python scripts/train_classifier_v1.py

To train on your own JSONL prompt pairs:

uv run smartmemo train-classifier \
  --data data/fixtures/customer_support_pairs.jsonl \
  --out models/classifier-custom.pt \
  --domain customer-support \
  --epochs 5

Then point SmartMemo at the checkpoint:

from smartmemo import ClassifierConfig, SmartMemo

cache = SmartMemo(
    domain="customer-support",
    classifier=ClassifierConfig(model_path="models/classifier-custom.pt"),
)

Feedback Export

SmartMemo records cache-hit lookups so explicit feedback can become training data:

result = await cache.get_or_call(
    prompt="Approve the customer's refund request",
    llm_function=call_llm,
)

if result.was_cache_hit and user_rejected_answer:
    await cache.report_bad_hit(result.query_id, reason="wrong refund decision")

written = cache.export_feedback_pairs("data/feedback_pairs.jsonl")
print(written)

The exported JSONL uses the same prompt-pair shape accepted by smartmemo train-classifier.

Manual Retraining

Use smartmemo retrain to turn durable feedback into a candidate classifier checkpoint:

uv run smartmemo --db-path .smartmemo/cache.db retrain \
  --out models/classifier-candidate.pt \
  --validation-data data/validation_pairs.jsonl \
  --seed-data data/fixtures/customer_support_pairs.jsonl \
  --domain customer-support \
  --min-precision 0.95 \
  --promote-to models/classifier-active.pt

The command always trains a candidate and writes an auditable <checkpoint>.report.json. Promotion only copies the candidate to --promote-to when the validation gates pass. SmartMemo does not run background retraining or automatically reload classifiers at runtime.

Release

Version 0.1.0 is configured for PyPI as smartmemo. The repository publishes through GitHub Actions trusted publishing from .github/workflows/publish-pypi.yml with the pypi environment.

git tag v0.1.0
git push origin v0.1.0

That tag builds the source distribution and wheel, then uploads them to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartmemo-0.1.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartmemo-0.1.0-py3-none-any.whl (798.4 kB view details)

Uploaded Python 3

File details

Details for the file smartmemo-0.1.0.tar.gz.

File metadata

  • Download URL: smartmemo-0.1.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartmemo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d3919e8ac3407dab70abf7d5f05f4880bfb23c37d14dd26106dbfe8d0058412
MD5 e52c731f2b1fc3b76ecaed12ab7900ac
BLAKE2b-256 b90f9d74952fc48ba4a7833feb4d9cf223a4ee80625f11ae06a86f5689a6ed2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartmemo-0.1.0.tar.gz:

Publisher: publish-pypi.yml on awesome-pro/smartmemo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smartmemo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smartmemo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 798.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartmemo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7779bea40e4972c5ac78c301b6fccaea3f2d413f80b28e28a2280acdb9b34259
MD5 20ad1d44c36a23f3b999009c00d39f91
BLAKE2b-256 2e96ffaa72a482a386457eaf318e0894f10e83f1509212c138df2bc81c44d0cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartmemo-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on awesome-pro/smartmemo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page