Sentence embeddings from any local causal LLM, no fine-tuning.

These details have not been verified by PyPI

Project links

Project description

local-llm-embed

Get sentence embeddings out of any local causal LLM you already have running. No fine-tuning, no separate encoder model.

If you're running Llama / Qwen / Phi / Mistral / TinyLlama via transformers, Ollama, vLLM, or llama.cpp and want embeddings for RAG / retrieval / classification — this library extracts them from the model you already have, in a few lines of code.

Why

	dedicated encoder (BGE-M3, MiniLM)	this library
Need a separate ~500 MB model load	yes	no (reuses your LLM)
Need fine-tuning	already trained	none
Works on any HF causal LM	n/a	yes
STS Spearman vs MiniLM-L6	0.867 (baseline)	0.806 (Phi-3.5)
Banking77 accuracy vs MiniLM-L6	0.5500	0.5540 (wins)

The trade-off is honest: dedicated encoders still beat raw LLM probes on pure semantic similarity (STS) by ~6 points. But on classification-style tasks like Banking77, this library matches or slightly beats the baseline — using a model you already have in memory.

Install

pip install local-llm-embed

Or with the Hugging Face Hub helpers (for downloading pre-fit whiteners):

pip install "local-llm-embed[hub]"

Quick start

from local_llm_embed import LocalLLMEmbedder

embedder = LocalLLMEmbedder("Qwen/Qwen2.5-0.5B-Instruct")
emb = embedder.encode(["The cat sat on the mat.", "A feline rested."])
print(emb.shape)  # (2, 896)
print(emb @ emb.T)  # cosine similarity matrix

By default this uses prefix+whiten (the most universally strong variation in our benchmarks). The whitener is fit lazily on the first batch you encode.

Use a calibration set for better whitening

calibration = [...]  # ~1000 representative texts from your domain
embedder.fit_whitener(calibration)
embedder.save_whitener("./domain_whiten.npz")

# later, in another session:
embedder = LocalLLMEmbedder("Qwen/Qwen2.5-0.5B-Instruct",
                             whitener_path="./domain_whiten.npz")

Pick a different variation

embedder = LocalLLMEmbedder(
    "microsoft/Phi-3.5-mini-instruct",
    variation="echo+whiten",   # best for STS
    layer="final",
    pooling="weighted_mean",
)

Variations

Three train-free recipes are bundled. They're combinations of well-known techniques, ranked by how well they performed in our internal benchmark (STSB validation, Banking77; see BENCHMARKS.md):

prefix+whiten (default) — feed the text in, take the chosen layer + pooling, then center & ZCA-whiten the resulting matrix. Whitening removes the anisotropy ("everything-looks-similar") problem that causal LMs suffer from. Universal +0.12 to +0.22 STS Spearman over no whitening.
echo+whiten — duplicate the text and pool only over the second copy, so each pooled token has seen the full sentence (works around the causal mask). Best STS combination in our tests.
prefix — no transformation. For comparison / debugging.

Hardware notes

Causal LMs at fp32 are heavy on RAM. We default to bf16 if your CPU has the avx512_bf16 flag (most modern AMD / Intel desktop CPUs do); on GPU just pass device="cuda" and the same bf16 default applies.

Limitations (be honest)

For pure semantic textual similarity, dedicated contrastive encoders (BGE-M3, all-MiniLM-L6-v2) still win by ~6 STS points. This library is for the case where you already have a causal LM loaded.
Whitening requires a calibration set of at least a few hundred texts. Without it, self-whitening is used at encode time (fits on the batch you're encoding). That's worse than a good calibration set but still better than raw probes.
bidirectional inference (LLM2Vec-style attention-mask removal) is not bundled. We benchmarked it; it consistently hurt without fine-tuning and we don't want to ship a footgun.

Acknowledgements

The technique is a combination of BERT-whitening (Su et al. 2021), Echo Embeddings (Springer et al. 2024), and PromptEOL (Jiang et al. 2023). This library packages the train-free subset.

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_llm_embed-0.1.0.tar.gz (14.9 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

local_llm_embed-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file local_llm_embed-0.1.0.tar.gz.

File metadata

Download URL: local_llm_embed-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 14.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for local_llm_embed-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a5163f7c828828a24135e0d455ac4934e4f846144b975784739e38bc4eb92a47`
MD5	`3f9bed47f0bf9c06759ce55f878be5a4`
BLAKE2b-256	`af6b6c32ea85903efda1cdaaadd6d328f0fe5ffbb2190c8b2ead55fed30a9060`

See more details on using hashes here.

File details

Details for the file local_llm_embed-0.1.0-py3-none-any.whl.

File metadata

Download URL: local_llm_embed-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for local_llm_embed-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bb3e65df4fbfcd17bfbb9eb5bfc08cb7c97c6d61ee11122739906f1f1d71cd9`
MD5	`fd9a8933ba6015f6d41f42c08bd7a49d`
BLAKE2b-256	`478faaa989ec6558679e76cc6d54f49f53f48bf8d1a72b25455ac19318d885c3`

See more details on using hashes here.

local-llm-embed 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

local-llm-embed

Why

Install

Quick start

Use a calibration set for better whitening

Pick a different variation

Variations

Hardware notes

Limitations (be honest)

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes