Skip to main content

LAM (Linear Attention Models) — Deterministic recall with SAID Crystalline Attention. SAID-LAM-v1 is Linear Attention Memory.

Project description

SAIDResearch

LAM

SAID‑LAM‑v1

LAM (Linear Attention Models) — a new family beyond semantic transformers.
SAID‑LAM‑v1 is Linear Attention Memory.

PHILOSOPHY: DETERMINISM OVER PROBABILITY

"The answer IS X. Because I Said so." — At ANY scale


Menu

  1. Quick-setup
  2. MTEB testing (benchmarks)
  3. Model Details
  4. Performance
  5. Drop-in sentence-transformers Replacement
  6. Usage
  7. API Reference
  8. Model Files
  9. Citation
  10. Links

Quick-setup

This quick setup ensures:

  • Code is loaded from your pip-installed said-lam package (not a local checkout).
  • Weights are loaded from the Hugging Face cache (auto-downloaded on first run).

venv is the standard, recommended way to keep dependencies isolated. You can install globally, but using a virtual environment avoids conflicts.

macOS / Linux (bash, zsh)

cd /path/to/your/project

# 1) Create & activate a clean virtualenv
python3 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip

# 2) Install SAID-LAM (CPU)
pip install said-lam

# 3) Run the quick end-to-end sanity test (CPU)
python said_quick_test.py

Windows (PowerShell)

cd C:\path\to\your\project

# 1) Create & activate a clean virtualenv
py -m venv .venv
.\.venv\Scripts\Activate.ps1
py -m pip install --upgrade pip

# 2) Install SAID-LAM (CPU)
pip install said-lam

# 3) Run the quick end-to-end sanity test (CPU)
py .\said_quick_test.py

If PowerShell blocks activation, run this once (then retry the activate step):

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

GPU upgrade (CUDA) uses a separate package. Uninstall CPU first to avoid namespace conflicts (both use import said_lam):

pip uninstall -y said-lam
pip install --upgrade said-lam-gpu

# Run the same quick test on CUDA (if available)
python said_quick_test.py

Windows (PowerShell) equivalent:

pip uninstall -y said-lam
pip install --upgrade said-lam-gpu
py .\said_quick_test.py

MTEB testing (benchmarks)

For full benchmark-style evaluation (STS tasks, LongEmbed retrieval tasks, cache controls, and result JSON export), use mteb_test.py. This is a heavier workflow than said_quick_test.py and is intended for evaluation/benchmarking:

CPU:

. .venv/bin/activate
pip install -r requirements.txt mteb

# CPU smoke (fast coverage)
python mteb_test.py --smoke --device cpu --no-cache --output-dir ./smoke_results_cpu

# Example: run specific tasks
python mteb_test.py --tasks STS12 STS13 --device cpu --no-cache --output-dir ./results_cpu

GPU (CUDA):

. .venv/bin/activate
pip uninstall -y said-lam
pip install --upgrade said-lam-gpu
pip install -r requirements.txt mteb

# GPU smoke (fast coverage)
python mteb_test.py --smoke --device cuda --no-cache --output-dir ./smoke_results_gpu

# Example: run specific tasks
python mteb_test.py --tasks STS12 STS13 --device cuda --no-cache --output-dir ./results_gpu

If no CUDA device is available in the current runtime, the GPU wheel may fall back to CPU automatically.

SAID-LAM-v1 is a 23.85M parameter embedding model with O(n) linear complexity. Where standard transformers rely on O(n²) attention that slows and runs out of memory as context grows, LAM models replace this entirely with a recurrent state update that runs in strict O(n) time and constant memory, defining a new direction separate from transformer-based semantic models.

Distilled from all-MiniLM-L6-v2, while extending context from 512 tokens to 32K+ tokens — and demonstrating 100% recall on LongEmbed Needle-in-a-Haystack benchmarks across evaluated scales.

Model Details

Property Value
Model Category LAM (Linear Attention Models) — SAID-LAM-v1: Linear Attention Memory
Parameters 23,848,788
Embedding Dimension 384
Max Context Length 32,768 tokens
Memory Usage ~95 MB
Complexity O(n) linear — time AND memory
Framework Pure Rust (Candle) — no PyTorch required
Package Size ~6 MB binary + 92 MB weights (auto-downloaded)
License Apache 2.0 (weights) / Proprietary (code)

Performance

O(n) Linear Scaling

LAM scales linearly with input length — empirically validated up to 1M words with R²=1.000, with memory growth from ~0 MB at small inputs up to ~15 MB at 1M words:

STS-B Semantic Quality

Spearman r = 0.8181 on the STS-B test set (1,379 sentence pairs):

MTEB LongEmbed Benchmarks

Combined LongEmbed score (SAID-LAM-v1, average over all six tasks): ~91.0%.

Task Score
LEMBNeedleRetrieval 100.00%
LEMBPasskeyRetrieval 100.00%
LEMBNarrativeQARetrieval 69.93%
LEMBSummScreenFDRetrieval 96.59%
LEMBQMSumRetrieval 85.76%
LEMBWikimQARetrieval 93.98%

LongEmbed SOTA comparison

Task SAID-LAM-v1 (23M) Global SOTA
LEMBNeedleRetrieval 100.00% 100.00%
LEMBPasskeyRetrieval 100.00% 100.00%
LEMBNarrativeQARetrieval 69.93% 66.10%
LEMBSummScreenFDRetrieval 96.59% 99.10%
LEMBQMSumRetrieval 85.76% 83.70%
LEMBWikimQARetrieval 93.98% 91.20%

Drop-in sentence-transformers Replacement

LAM is a drop-in replacement for sentence-transformers. Same API, same output format.

Before (sentence-transformers):

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]

After (LAM):

from said_lam import LAM

model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]

Same output format, same shapes, same downstream compatibility. Everything that works with sentence-transformers embeddings (FAISS, ChromaDB, Pinecone, numpy dot product) works with LAM embeddings.

Property sentence-transformers LAM
Output (N, 384) ndarray, float32 (N, 384) ndarray, float32
L2-normalized Yes (default) Yes (default)
Cosine sim = dot product Yes Yes
Max tokens 512 12K (encode) / 32K (SCA)
Complexity O(n²) attention O(n) linear
Framework PyTorch (~2 GB) Rust (~6 MB)
Memory at 1M tokens (no chunking) OOM / impractical ~15 MB

Usage

FREE Tier — Embeddings (up to 12K tokens)

from said_lam import LAM

model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search is powerful"])
# embeddings.shape == (2, 384)

# Cosine similarity (L2-normalized by default)
similarity = embeddings[0] @ embeddings[1]
print(f"Similarity: {similarity:.4f}")

BETA SCA (SAID Crystalline Attention) — MTEB testing only

BETA SCA (SAID Crystalline Attention) is activated for MTEB testing only, to enable perfect LongEmbed context retrieval (e.g. LEMBNeedleRetrieval, LEMBPasskeyRetrieval). Use the MTEB evaluation flow; no signup or activation required for benchmarking.

Common Patterns

Similarity Between Texts

Embeddings are L2-normalized — cosine similarity is just a dot product:

emb = model.encode(["The cat sat on the mat", "A kitten rested on the rug"])
similarity = float(emb[0] @ emb[1])
print(f"Similarity: {similarity:.4f}")  # ~0.5761

Batch Similarity Matrix

import numpy as np

queries = ["How is the weather?", "What time is it?"]
candidates = ["Is it raining today?", "Do you have the time?", "Nice shoes"]

emb_q = model.encode(queries)      # (2, 384)
emb_c = model.encode(candidates)   # (3, 384)
sim_matrix = emb_q @ emb_c.T       # (2, 3)

Semantic Search Over a Corpus (FREE Tier)

import numpy as np

corpus = ["Python is a language", "The Eiffel Tower is in Paris",
          "ML uses neural networks", "Speed of light is 299792458 m/s"]
corpus_emb = model.encode(corpus)

query_emb = model.encode(["fastest thing in physics"])
scores = (query_emb @ corpus_emb.T)[0]
ranked = np.argsort(scores)[::-1]
for i in ranked:
    print(f"  {scores[i]:.4f}  {corpus[i]}")

Matryoshka Dimensionality Reduction

emb_128 = model.encode(["Hello world"], output_dim=128)  # (1, 128)
emb_64  = model.encode(["Hello world"], output_dim=64)   # (1, 64)
# Automatically truncated and re-normalized to unit length

Example impact on STS12 (cosine main_score, GPU):

dim STS12 score rel. to 384d
384 0.7493 100.0%
256 0.7472 99.7%
128 0.7459 99.6%
64 0.7327 97.8%

Token Limits

  • encode(): Up to 12,000 tokens per text. Returns embeddings for your RAG.
  • index() + search(): Up to 32,768 tokens per text (MTEB BETA SCA — LongEmbed/MTEB testing group only). SCA streaming — no embeddings, perfect recall.

encode() — returns one embedding per input text, capped at 12K tokens:

# Each text gets one embedding — long texts are chunked at 12K tokens
embeddings = model.encode(["short text", "very long text..."])  # (2, 384)

# Use output_dim for smaller embeddings (Matryoshka)
embeddings = model.encode(["short text", "very long text..."], output_dim=128)  # (2, 128)

Long documents? encode() caps at 12K tokens. For LongEmbed benchmarks (MTEB BETA SCA testing only), index() + search() support up to 32K tokens via SCA.

MTEB Evaluation

One model, one class: use the same LAM with mteb.evaluate() (LAM implements the global MTEB encoder protocol).

from said_lam import LAM
import mteb

model = LAM("SAIDResearch/SAID-LAM-v1")
tasks = mteb.get_tasks(tasks=["LEMBNeedleRetrieval", "LEMBPasskeyRetrieval"])
results = mteb.evaluate(model=model, tasks=tasks)

API Reference

LAM(model_name_or_path, device)

Parameter Default Description
model_name_or_path "SAIDResearch/SAID-LAM-v1" Hugging Face model ID to load (default: SAIDResearch/SAID-LAM-v1), or a local directory path pointing to the model files
device None (auto) Auto-selects CUDA GPU if available, otherwise CPU

Core Methods

Method Tier Description
model.encode(sentences, output_dim=None) FREE+ Encode to embeddings (384, 256, 128, or 64 dims)
model.index(doc_id, text) MTEB Index a document for search (benchmarks)
model.search(query, top_k) MTEB Retrieve documents by query (benchmarks)
model.truncate_embeddings(emb, dim) FREE+ Matryoshka truncation (64/128/256)
model.clear() MTEB Clear indexed documents (benchmarks)
model.stats() FREE+ Model statistics

Tier System

Tier encode() (SCA) How to Get Features
FREE 12K Default encode() only — embeddings for RAG
MTEB 12K 32K Auto-detected SCA for LongEmbed retrieval (benchmarks only)
LICENSED 32K 32K Coming soon + persistent storage + cloud sync
INFINITE Unlimited Unlimited Coming soon Oracle mode

Model Files

File Size Description
model.safetensors 92 MB Model weights (SafeTensors format)
config.json 1 KB Model configuration
tokenizer.json 467 KB Tokenizer vocabulary
tokenizer_config.json 350 B Tokenizer settings
vocab.txt 232 KB WordPiece vocabulary
special_tokens_map.json 112 B Special token definitions

Citation

@misc{said-lam-v1,
  title={SAID-LAM-v1: Linear Attention Memory},
  author={SAIDResearch},
  year={2026},
  url={https://saidhome.ai},
  note={23.85M parameter embedding model with O(n) linear complexity.
        384-dim embeddings, 32K context window, 100% NIAH recall.
        Distilled from all-MiniLM-L6-v2. Pure Rust (Candle) implementation.}
}

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

said_lam_gpu-1.0.7-cp312-cp312-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.12Windows x86-64

said_lam_gpu-1.0.7-cp312-cp312-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

said_lam_gpu-1.0.7-cp311-cp311-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.11Windows x86-64

said_lam_gpu-1.0.7-cp311-cp311-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ x86-64

said_lam_gpu-1.0.7-cp310-cp310-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.10Windows x86-64

said_lam_gpu-1.0.7-cp310-cp310-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ x86-64

File details

Details for the file said_lam_gpu-1.0.7-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f9f690931c24a73e1d417fad8cd6a2e21182c1b4d0cb57e960c4c8ef38a094dd
MD5 0572f9b96662319e7522f2c82e872cf6
BLAKE2b-256 a9c43ad1c15434caa11db0e804aa4f588a7eed524a7d4de4369487e692b343b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp312-cp312-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.7-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 9c97958d728d6cb05b72c987dbd3e3f96b14678388e589965967df607a3daadd
MD5 3ea5fdf1d677fd66034eeff5b68a13ea
BLAKE2b-256 b1698a95b2e0c4c5435f27b7db01624a1042d7ff805179dc163e5ee38cea8434

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp312-cp312-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.7-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b1a7df71a612b3f3669a347750ebbfba5001bd71f1f287d4afa5da609becaaf3
MD5 ef9f8bf73cbac65804d370867f3a65ca
BLAKE2b-256 f5eee05b4c8b055c0c37daf572f8b055e02164ff20f5349b32bbccf6c787830d

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp311-cp311-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.7-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 c56683bc7cdf29cbfb7f4d721103b690051c74a4d8ffc851ba6c7d2ab7ff407c
MD5 24a705cca1a921786f99e30fe5350966
BLAKE2b-256 a1e8c7e9afbf7c240c6f177fdc10ae6dca8802048b31b2b9f87f40cb047e296a

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp311-cp311-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 dd13d0a87c328f44d195b44706bb7b0d1cdb19cbd51b8c8ba80bcf8d5d9512a3
MD5 19c76234521bd88b1a27b27b45660a71
BLAKE2b-256 86e2287c682c6f5f6b272742408ba575afb484deeff24aadc299355a7a21d6e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp310-cp310-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.7-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.7-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 c14acc9cf3b05bd6e98765641866627107f55fc7ee84c2438ee289a077ac8c86
MD5 5bada43ac56361dadc290086bd1153bb
BLAKE2b-256 d5fe382e24e08e6b76deeca4fb430cd1eb53bbc390cb206937ebaae0b5b75473

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.7-cp310-cp310-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page