Skip to main content

LAM (Linear Attention Models) — Deterministic recall with SAID Crystalline Attention. SAID-LAM-v1 is Linear Attention Memory.

Project description

SAID-LAM-v1

LAM (Linear Attention Models) — a new family beyond semantic transformers. SAID‑LAM‑v1 is Linear Attention Memory.

"The answer IS X. Because I Said so." — At ANY scale.

SAID-LAM-v1 is a 23.85M parameter embedding model with O(n) linear complexity. Where standard transformers rely on O(n²) attention that slows and runs out of memory as context grows, LAM models replace this entirely with a recurrent state update that runs in strict O(n) time and constant memory, defining a new direction separate from transformer-based semantic models.

Distilled from all-MiniLM-L6-v2, while extending context from 512 tokens to 32K+ tokens — and demonstrating 100% recall on LongEmbed Needle-in-a-Haystack benchmarks across evaluated scales.

Model Details

Property Value
Model Category LAM (Linear Attention Models) — SAID-LAM-v1: Linear Attention Memory
Parameters 23,848,788
Embedding Dimension 384
Max Context Length 32,768 tokens
Memory Usage ~95 MB
Complexity O(n) linear — time AND memory
Framework Pure Rust (Candle) — no PyTorch required
Package Size ~6 MB binary + 92 MB weights (auto-downloaded)
License Apache 2.0 (weights) / Proprietary (code)

Performance

O(n) Linear Scaling

LAM scales linearly with input length — empirically validated up to 1M words with R²=1.000, with memory growth from ~0 MB at small inputs up to ~15 MB at 1M words:

STS-B Semantic Quality

Spearman r = 0.8181 on the STS-B test set (1,379 sentence pairs):

MTEB LongEmbed Benchmarks

Combined LongEmbed score (SAID-LAM-v1, average over all six tasks): ~91.0%.

Task Score
LEMBNeedleRetrieval 100.00%
LEMBPasskeyRetrieval 100.00%
LEMBNarrativeQARetrieval 69.93%
LEMBSummScreenFDRetrieval 96.59%
LEMBQMSumRetrieval 85.76%
LEMBWikimQARetrieval 93.98%

LongEmbed SOTA comparison

Task SAID-LAM-v1 (23M) Global SOTA
LEMBNeedleRetrieval 100.00% 100.00%
LEMBPasskeyRetrieval 100.00% 100.00%
LEMBNarrativeQARetrieval 69.93% 66.10%
LEMBSummScreenFDRetrieval 96.59% 99.10%
LEMBQMSumRetrieval 85.76% 83.70%
LEMBWikimQARetrieval 93.98% 91.20%

Install

pip install said-lam

CUDA (GPU) wheels are published under a separate PyPI project:

pip install said-lam-gpu

To upgrade an existing installation to the latest version:

pip install --upgrade said-lam
# or for the GPU version:
pip install --upgrade said-lam-gpu

To install a specific older version:

pip install said-lam==1.0.2
# or for the GPU version:
pip install said-lam-gpu==1.0.2

To uninstall the package:

pip uninstall said-lam
# or for the GPU version:
pip uninstall said-lam-gpu

Note: Both packages use the exact same import said_lam namespace. Please ensure you only have one of them installed at a time to avoid conflicts.

On first use, model weights (~92 MB) are automatically downloaded from HuggingFace and cached locally. The pip package itself is only ~6 MB (compiled Rust binary — weights are NOT bundled).

Drop-in sentence-transformers Replacement

LAM is a drop-in replacement for sentence-transformers. Same API, same output format.

Before (sentence-transformers):

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]

After (LAM):

from said_lam import LAM

model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search"])
# embeddings.shape == (2, 384), float32, L2-normalized
similarity = embeddings[0] @ embeddings[1]

Same output format, same shapes, same downstream compatibility. Everything that works with sentence-transformers embeddings (FAISS, ChromaDB, Pinecone, numpy dot product) works with LAM embeddings.

Property sentence-transformers LAM
Output (N, 384) ndarray, float32 (N, 384) ndarray, float32
L2-normalized Yes (default) Yes (default)
Cosine sim = dot product Yes Yes
Max tokens 512 12K (encode) / 32K (SCA)
Complexity O(n²) attention O(n) linear
Framework PyTorch (~2 GB) Rust (~6 MB)
Memory at 1M tokens (no chunking) OOM / impractical ~15 MB

Usage

FREE Tier — Embeddings (up to 12K tokens)

from said_lam import LAM

model = LAM("SAIDResearch/SAID-LAM-v1")
embeddings = model.encode(["Hello world", "Semantic search is powerful"])
# embeddings.shape == (2, 384)

# Cosine similarity (L2-normalized by default)
similarity = embeddings[0] @ embeddings[1]
print(f"Similarity: {similarity:.4f}")

BETA SCA (SAID Crystalline Attention) — MTEB testing only

BETA SCA (SAID Crystalline Attention) is activated for MTEB testing only, to enable perfect LongEmbed context retrieval (e.g. LEMBNeedleRetrieval, LEMBPasskeyRetrieval). Use the MTEB evaluation flow; no signup or activation required for benchmarking.

Common Patterns

Similarity Between Texts

Embeddings are L2-normalized — cosine similarity is just a dot product:

emb = model.encode(["The cat sat on the mat", "A kitten rested on the rug"])
similarity = float(emb[0] @ emb[1])
print(f"Similarity: {similarity:.4f}")  # ~0.5761

Batch Similarity Matrix

import numpy as np

queries = ["How is the weather?", "What time is it?"]
candidates = ["Is it raining today?", "Do you have the time?", "Nice shoes"]

emb_q = model.encode(queries)      # (2, 384)
emb_c = model.encode(candidates)   # (3, 384)
sim_matrix = emb_q @ emb_c.T       # (2, 3)

Semantic Search Over a Corpus (FREE Tier)

import numpy as np

corpus = ["Python is a language", "The Eiffel Tower is in Paris",
          "ML uses neural networks", "Speed of light is 299792458 m/s"]
corpus_emb = model.encode(corpus)

query_emb = model.encode(["fastest thing in physics"])
scores = (query_emb @ corpus_emb.T)[0]
ranked = np.argsort(scores)[::-1]
for i in ranked:
    print(f"  {scores[i]:.4f}  {corpus[i]}")

Matryoshka Dimensionality Reduction

emb_128 = model.encode(["Hello world"], output_dim=128)  # (1, 128)
emb_64  = model.encode(["Hello world"], output_dim=64)   # (1, 64)
# Automatically truncated and re-normalized to unit length

Example impact on STS12 (cosine main_score, GPU):

dim STS12 score rel. to 384d
384 0.7493 100.0%
256 0.7472 99.7%
128 0.7459 99.6%
64 0.7327 97.8%

Token Limits

  • encode(): Up to 12,000 tokens per text. Returns embeddings for your RAG.
  • index() + search(): Up to 32,768 tokens per text (MTEB BETA SCA — LongEmbed/MTEB testing group only). SCA streaming — no embeddings, perfect recall.

encode() — returns one embedding per input text, capped at 12K tokens:

# Each text gets one embedding — long texts are chunked at 12K tokens
embeddings = model.encode(["short text", "very long text..."])  # (2, 384)

# Use output_dim for smaller embeddings (Matryoshka)
embeddings = model.encode(["short text", "very long text..."], output_dim=128)  # (2, 128)

Long documents? encode() caps at 12K tokens. For LongEmbed benchmarks (MTEB BETA SCA testing only), index() + search() support up to 32K tokens via SCA.

MTEB Evaluation

One model, one class: use the same LAM with mteb.evaluate() (LAM implements the global MTEB encoder protocol).

from said_lam import LAM
import mteb

model = LAM("SAIDResearch/SAID-LAM-v1")
tasks = mteb.get_tasks(tasks=["LEMBNeedleRetrieval", "LEMBPasskeyRetrieval"])
results = mteb.evaluate(model=model, tasks=tasks)

API Reference

LAM(model_name_or_path, device)

Parameter Default Description
model_name_or_path "SAIDResearch/SAID-LAM-v1" Hugging Face model ID to load (default: SAIDResearch/SAID-LAM-v1), or a local directory path pointing to the model files
device None (auto) Auto-selects CUDA GPU if available, otherwise CPU

Core Methods

Method Tier Description
model.encode(sentences, output_dim=None) FREE+ Encode to embeddings (384, 256, 128, or 64 dims)
model.index(doc_id, text) MTEB Index a document for search (benchmarks)
model.search(query, top_k) MTEB Retrieve documents by query (benchmarks)
model.truncate_embeddings(emb, dim) FREE+ Matryoshka truncation (64/128/256)
model.clear() MTEB Clear indexed documents (benchmarks)
model.stats() FREE+ Model statistics

Tier System

Tier encode() (SCA) How to Get Features
FREE 12K Default encode() only — embeddings for RAG
MTEB 12K 32K Auto-detected SCA for LongEmbed retrieval (benchmarks only)
LICENSED 32K 32K Coming soon + persistent storage + cloud sync
INFINITE Unlimited Unlimited Coming soon Oracle mode

GPU Support

CPU wheels are installed by default. For GPU acceleration:

# Build from source with CUDA (Linux)
pip install maturin
maturin build --release --features cuda

# Metal (macOS Apple Silicon)
maturin build --release --features metal

Model Files

File Size Description
model.safetensors 92 MB Model weights (SafeTensors format)
config.json 1 KB Model configuration
tokenizer.json 467 KB Tokenizer vocabulary
tokenizer_config.json 350 B Tokenizer settings
vocab.txt 232 KB WordPiece vocabulary
special_tokens_map.json 112 B Special token definitions

Citation

@misc{said-lam-v1,
  title={SAID-LAM-v1: Linear Attention Memory},
  author={SAIDResearch},
  year={2026},
  url={https://saidhome.ai},
  note={23.85M parameter embedding model with O(n) linear complexity.
        384-dim embeddings, 32K context window, 100% NIAH recall.
        Distilled from all-MiniLM-L6-v2. Pure Rust (Candle) implementation.}
}

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.12Windows x86-64

said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.11Windows x86-64

said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ x86-64

said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.10Windows x86-64

said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ x86-64

File details

Details for the file said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 64f4a848e114f932c996beac22afa32241d94d13f41451593deb8591d927b352
MD5 8f585554ac8bd986f0c9e0956e011983
BLAKE2b-256 021cfef8264272b2fe9f2e441fbc956da0da4d64cbc219724717d512bb6b3d43

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp312-cp312-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 e69b55932cbb8cdcd50397224c7689a1e139336e696bd5e92513b6047e5f478c
MD5 88245aa55b9453f770fda897ea287b2b
BLAKE2b-256 b0cf51585ecc0ab9fbd8f8bd761a5db8a4392561fac727114936daa96ff4793b

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp312-cp312-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5cb837d8f48bcd37379b97282158b3840942fd28055eb8290f0dc403815ebb53
MD5 1f7e616e7592ae3f6453ad2c492080dd
BLAKE2b-256 ecc9e20e6ecbcb1cccd6c83c22cf0c1bb581672c653e665fa316b7f2e27e1fd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp311-cp311-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 5f159dde4a00967b931c70a3e77a236d710262220bec09c816a96a91535ec0d4
MD5 7abf2a812d4fb3a44e9f453000ad4efa
BLAKE2b-256 af89fd30dae8a192d4abf6bc4c5466b62757460af55c20cb0038f4d0bd6075da

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp311-cp311-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ce244d77b418c79b3b1dc713d85912cbd1f699adcfca41418daaea2b6690bacb
MD5 1038fb9cdc51a613daf41c7346d51fda
BLAKE2b-256 be01cd4792e35680a025bceabccd71fa1de3c8e59d618505e84e226e3375d87d

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp310-cp310-win_amd64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 4b06bc2785c5bd09d2df902ca01bd3745e4b4e4a5b72c7cb855e0fbe921d7c19
MD5 c547872ab1ca07224ee2370a45f62b36
BLAKE2b-256 bfb7755017e2cabdc2cfc80054cb05bb37bd4ddb7295d2e0eb02967563f2591e

See more details on using hashes here.

Provenance

The following attestation bundles were made for said_lam_gpu-1.0.3-cp310-cp310-manylinux_2_35_x86_64.whl:

Publisher: release.yml on SAIDResearch/SAID-LAM-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page