LAM (Linear Attention Models) — Deterministic recall with SAID Crystalline Attention. SAID-LAM-v1 is Linear Attention Memory.
Project description
SAID-LAM-v1
LAM (Linear Attention Models) — a new family beyond semantic transformers. SAID‑LAM‑v1 is Linear Attention Memory.
"The answer IS X. Because I Said so." — At ANY scale.
SAID-LAM-v1 is a 23.85M parameter embedding model with O(n) linear complexity. Where standard transformers rely on O(n²) attention that slows and runs out of memory as context grows, LAM models replace this entirely with a recurrent state update that runs in strict O(n) time and constant memory, defining a new direction separate from transformer-based semantic models.
Distilled from all-MiniLM-L6-v2, while extending context from 512 tokens to 32K+ tokens — and demonstrating 100% recall on LongEmbed Needle-in-a-Haystack benchmarks across evaluated scales.
Model Details
| Property | Value |
|---|---|
| Model Category | LAM (Linear Attention Models) — SAID-LAM-v1: Linear Attention Memory |
| Parameters | 23,848,788 |
| Embedding Dimension | 384 |
| Max Context Length | 32,768 tokens |
| Memory Usage | ~95 MB |
| Complexity | O(n) linear — time AND memory |
| Framework | Pure Rust (Candle) — no PyTorch required |
| Package Size | ~6 MB binary + 92 MB weights (auto-downloaded) |
| License | Apache 2.0 (weights) / Proprietary (code) |
Performance
O(n) Linear Scaling
LAM scales linearly with input length — empirically validated up to 1M words with R²=1.000, with memory growth from ~0 MB at small inputs up to ~15 MB at 1M words:
STS-B Semantic Quality
Spearman r = 0.8181 on the STS-B test set (1,379 sentence pairs):
MTEB LongEmbed Benchmarks
Combined LongEmbed score (SAID-LAM-v1, average over all six tasks): ~91.0%.
| Task | Score |
|---|---|
| LEMBNeedleRetrieval | 100.00% |
| LEMBPasskeyRetrieval | 100.00% |
| LEMBNarrativeQARetrieval | 69.93% |
| LEMBSummScreenFDRetrieval | 96.59% |
| LEMBQMSumRetrieval | 85.76% |
| LEMBWikimQARetrieval | 93.98% |
LongEmbed SOTA comparison
| Task | SAID-LAM-v1 (23M) | Global SOTA |
|---|---|---|
| LEMBNeedleRetrieval | 100.00% | 100.00% |
| LEMBPasskeyRetrieval | 100.00% | 100.00% |
| LEMBNarrativeQARetrieval | 69.93% | 66.10% |
| LEMBSummScreenFDRetrieval | 96.59% | 99.10% |
| LEMBQMSumRetrieval | 85.76% | 83.70% |
| LEMBWikimQARetrieval | 93.98% | 91.20% |
Install
pip install said-lam
CUDA (GPU) wheels are published under a separate PyPI project:
pip install said-lam-gpu
To upgrade an existing installation to the latest version:
pip install --upgrade said-lam
# or for the GPU version:
pip install --upgrade said-lam-gpu
To install a specific older version:
pip install said-lam==1.0.2
# or for the GPU version:
pip install said-lam-gpu==1.0.2
To uninstall the package:
pip uninstall said-lam
# or for the GPU version:
pip uninstall said-lam-gpu
Note: Both packages use the exact same import said_lam namespace. Please ensure you only have one of them installed at a time to avoid conflicts.
On first use, model weights (~92 MB) are automatically downloaded from HuggingFace and cached locally. The pip package itself is only ~6 MB (compiled Rust binary — weights are NOT bundled).
Drop-in sentence-transformers Replacement
LAM is a drop-in replacement for sentence-transformers. Same API, same output format.
Before (sentence-transformers):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]
After (LAM):
from said_lam import LAM
model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]
Same output format, same shapes, same downstream compatibility. Everything that works with sentence-transformers embeddings (FAISS, ChromaDB, Pinecone, numpy dot product) works with LAM embeddings.
| Property | sentence-transformers | LAM |
|---|---|---|
| Output | (N, 384) ndarray, float32 | (N, 384) ndarray, float32 |
| L2-normalized | Yes (default) | Yes (default) |
| Cosine sim = dot product | Yes | Yes |
| Max tokens | 512 | 12K (encode) / 32K (SCA) |
| Complexity | O(n²) attention | O(n) linear |
| Framework | PyTorch (~2 GB) | Rust (~6 MB) |
| Memory at 1M tokens (no chunking) | OOM / impractical | ~15 MB |
Usage
FREE Tier — Embeddings (up to 12K tokens)
from said_lam import LAM
model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search is powerful"])
# embeddings.shape == (2, 384)
# Cosine similarity (L2-normalized by default)
similarity = embeddings[0] @ embeddings[1]
print(f"Similarity: {similarity:.4f}")
BETA SCA (SAID Crystalline Attention) — MTEB testing only
BETA SCA (SAID Crystalline Attention) is activated for MTEB testing only, to enable perfect LongEmbed context retrieval (e.g. LEMBNeedleRetrieval, LEMBPasskeyRetrieval). Use the MTEB evaluation flow; no signup or activation required for benchmarking.
Common Patterns
Similarity Between Texts
Embeddings are L2-normalized — cosine similarity is just a dot product:
emb = model.encode(["The cat sat on the mat", "A kitten rested on the rug"])
similarity = float(emb[0] @ emb[1])
print(f"Similarity: {similarity:.4f}") # ~0.5761
Batch Similarity Matrix
import numpy as np
queries = ["How is the weather?", "What time is it?"]
candidates = ["Is it raining today?", "Do you have the time?", "Nice shoes"]
emb_q = model.encode(queries) # (2, 384)
emb_c = model.encode(candidates) # (3, 384)
sim_matrix = emb_q @ emb_c.T # (2, 3)
Semantic Search Over a Corpus (FREE Tier)
import numpy as np
corpus = ["Python is a language", "The Eiffel Tower is in Paris",
"ML uses neural networks", "Speed of light is 299792458 m/s"]
corpus_emb = model.encode(corpus)
query_emb = model.encode(["fastest thing in physics"])
scores = (query_emb @ corpus_emb.T)[0]
ranked = np.argsort(scores)[::-1]
for i in ranked:
print(f" {scores[i]:.4f} {corpus[i]}")
Matryoshka Dimensionality Reduction
emb_128 = model.encode(["Hello world"], output_dim=128) # (1, 128)
emb_64 = model.encode(["Hello world"], output_dim=64) # (1, 64)
# Automatically truncated and re-normalized to unit length
Example impact on STS12 (cosine main_score, GPU):
| dim | STS12 score | rel. to 384d |
|---|---|---|
| 384 | 0.7493 | 100.0% |
| 256 | 0.7472 | 99.7% |
| 128 | 0.7459 | 99.6% |
| 64 | 0.7327 | 97.8% |
Token Limits
encode(): Up to 12,000 tokens per text. Returns embeddings for your RAG.index()+search(): Up to 32,768 tokens per text (MTEB BETA SCA — LongEmbed/MTEB testing group only). SCA streaming — no embeddings, perfect recall.
encode() — returns one embedding per input text, capped at 12K tokens:
# Each text gets one embedding — long texts are chunked at 12K tokens
embeddings = model.encode(["short text", "very long text..."]) # (2, 384)
# Use output_dim for smaller embeddings (Matryoshka)
embeddings = model.encode(["short text", "very long text..."], output_dim=128) # (2, 128)
Long documents?
encode()caps at 12K tokens. For LongEmbed benchmarks (MTEB BETA SCA testing only),index()+search()support up to 32K tokens via SCA.
MTEB Evaluation
One model, one class: use the same LAM with mteb.evaluate() (LAM implements the global MTEB encoder protocol).
from said_lam import LAM
import mteb
model = LAM("SAIDResearch/SAID-LAM-v1")
tasks = mteb.get_tasks(tasks=["LEMBNeedleRetrieval", "LEMBPasskeyRetrieval"])
results = mteb.evaluate(model=model, tasks=tasks)
API Reference
LAM(model_name_or_path, device)
| Parameter | Default | Description |
|---|---|---|
model_name_or_path |
"SAIDResearch/SAID-LAM-v1" |
Hugging Face model ID to load (default: SAIDResearch/SAID-LAM-v1), or a local directory path pointing to the model files |
device |
None (auto) |
Auto-selects CUDA GPU if available, otherwise CPU |
Core Methods
| Method | Tier | Description |
|---|---|---|
model.encode(sentences, output_dim=None) |
FREE+ | Encode to embeddings (384, 256, 128, or 64 dims) |
model.index(doc_id, text) |
MTEB | Index a document for search (benchmarks) |
model.search(query, top_k) |
MTEB | Retrieve documents by query (benchmarks) |
model.truncate_embeddings(emb, dim) |
FREE+ | Matryoshka truncation (64/128/256) |
model.clear() |
MTEB | Clear indexed documents (benchmarks) |
model.stats() |
FREE+ | Model statistics |
Tier System
| Tier | encode() | (SCA) | How to Get | Features |
|---|---|---|---|---|
FREE |
12K | — | Default | encode() only — embeddings for RAG |
MTEB |
12K | 32K | Auto-detected | SCA for LongEmbed retrieval (benchmarks only) |
LICENSED |
32K | 32K | Coming soon | + persistent storage + cloud sync |
INFINITE |
Unlimited | Unlimited | Coming soon | Oracle mode |
GPU Support
CPU wheels are installed by default. For GPU acceleration:
# Build from source with CUDA (Linux)
pip install maturin
maturin build --release --features cuda
# Metal (macOS Apple Silicon)
maturin build --release --features metal
Model Files
| File | Size | Description |
|---|---|---|
model.safetensors |
92 MB | Model weights (SafeTensors format) |
config.json |
1 KB | Model configuration |
tokenizer.json |
467 KB | Tokenizer vocabulary |
tokenizer_config.json |
350 B | Tokenizer settings |
vocab.txt |
232 KB | WordPiece vocabulary |
special_tokens_map.json |
112 B | Special token definitions |
Citation
@misc{said-lam-v1,
title={SAID-LAM-v1: Linear Attention Memory},
author={SAIDResearch},
year={2026},
url={https://saidhome.ai},
note={23.85M parameter embedding model with O(n) linear complexity.
384-dim embeddings, 32K context window, 100% NIAH recall.
Distilled from all-MiniLM-L6-v2. Pure Rust (Candle) implementation.}
}
Links
- Organization: SAIDResearch
- PyPI: said-lam
- Distilled From: all-MiniLM-L6-v2
- Framework: Candle (Hugging Face Rust ML)
- Contact: research@saidhome.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64f4a848e114f932c996beac22afa32241d94d13f41451593deb8591d927b352
|
|
| MD5 |
8f585554ac8bd986f0c9e0956e011983
|
|
| BLAKE2b-256 |
021cfef8264272b2fe9f2e441fbc956da0da4d64cbc219724717d512bb6b3d43
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl -
Subject digest:
64f4a848e114f932c996beac22afa32241d94d13f41451593deb8591d927b352 - Sigstore transparency entry: 1134741145
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e69b55932cbb8cdcd50397224c7689a1e139336e696bd5e92513b6047e5f478c
|
|
| MD5 |
88245aa55b9453f770fda897ea287b2b
|
|
| BLAKE2b-256 |
b0cf51585ecc0ab9fbd8f8bd761a5db8a4392561fac727114936daa96ff4793b
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl -
Subject digest:
e69b55932cbb8cdcd50397224c7689a1e139336e696bd5e92513b6047e5f478c - Sigstore transparency entry: 1134741310
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cb837d8f48bcd37379b97282158b3840942fd28055eb8290f0dc403815ebb53
|
|
| MD5 |
1f7e616e7592ae3f6453ad2c492080dd
|
|
| BLAKE2b-256 |
ecc9e20e6ecbcb1cccd6c83c22cf0c1bb581672c653e665fa316b7f2e27e1fd9
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl -
Subject digest:
5cb837d8f48bcd37379b97282158b3840942fd28055eb8290f0dc403815ebb53 - Sigstore transparency entry: 1134741295
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f159dde4a00967b931c70a3e77a236d710262220bec09c816a96a91535ec0d4
|
|
| MD5 |
7abf2a812d4fb3a44e9f453000ad4efa
|
|
| BLAKE2b-256 |
af89fd30dae8a192d4abf6bc4c5466b62757460af55c20cb0038f4d0bd6075da
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl -
Subject digest:
5f159dde4a00967b931c70a3e77a236d710262220bec09c816a96a91535ec0d4 - Sigstore transparency entry: 1134741131
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce244d77b418c79b3b1dc713d85912cbd1f699adcfca41418daaea2b6690bacb
|
|
| MD5 |
1038fb9cdc51a613daf41c7346d51fda
|
|
| BLAKE2b-256 |
be01cd4792e35680a025bceabccd71fa1de3c8e59d618505e84e226e3375d87d
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl -
Subject digest:
ce244d77b418c79b3b1dc713d85912cbd1f699adcfca41418daaea2b6690bacb - Sigstore transparency entry: 1134741185
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b06bc2785c5bd09d2df902ca01bd3745e4b4e4a5b72c7cb855e0fbe921d7c19
|
|
| MD5 |
c547872ab1ca07224ee2370a45f62b36
|
|
| BLAKE2b-256 |
bfb7755017e2cabdc2cfc80054cb05bb37bd4ddb7295d2e0eb02967563f2591e
|
Provenance
The following attestation bundles were made for said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl:
Publisher:
release.yml on SAIDResearch/SAID-LAM-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl -
Subject digest:
4b06bc2785c5bd09d2df902ca01bd3745e4b4e4a5b72c7cb855e0fbe921d7c19 - Sigstore transparency entry: 1134741218
- Sigstore integration time:
-
Permalink:
SAIDResearch/SAID-LAM-private@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/SAIDResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c40846ba0a1249428eec0d12202beaa0fd9c69a4 -
Trigger Event:
push
-
Statement type: