Embedding extractor for decoder-only LLMs

Project description

llemb

Unified embedding extraction from Decoder-only LLMs.

Features

Backends: Support for Hugging Face Transformers.
Pooling Strategies:
- mean: Average pooling of all tokens (excluding padding).
- last_token: Vector of the last token.
- eos_token: Vector corresponding to the EOS token position.
- prompt_eol: Embeddings extracted using a prompt template targeting the last token.
- pcoteol: "Pretended Chain of Thought" - wraps input in a reasoning template.
- ke: "Knowledge Enhancement" - wraps input in a context-aware template.
Quantization: Support for 4-bit and 8-bit quantization via bitsandbytes.
Layer Selection: Extract embeddings from any layer.
- Defaults to -1 (last layer) for standard strategies.
- Defaults to -2 (second-to-last layer) for pcoteol and ke (as recommended by research).

Installation

Install using uv:

uv add llemb

To include quantization support:

uv add llemb[quantization]

Quick Start

Initialize the encoder with minimal setup (defaults to transformers, no quantization, cpu/cuda auto-detect):

import llemb

# Minimal setup
enc = llemb.Encoder("meta-llama/Llama-3.1-8B")

# Extract embeddings
embeddings = enc.encode("Hello world", pooling="mean")
print(embeddings.shape)

Advanced Usage

Initialize with specific options:

import llemb

# Initialize encoder with specific backend and configuration
enc = llemb.Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    device="cuda", # Force CUDA
    quantization="4bit" # Use 4-bit quantization
)

# Extract embeddings using pcoteol strategy (automatically uses layer -2)
embeddings = enc.encode("Hello world", pooling="pcoteol")

Transformers Backend Configuration

When using the transformers backend, you can pass standard Hugging Face AutoModel arguments directly to the Encoder.

Example 1: Using Flash Attention 2

import torch

encoder = Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16
)

Example 2: Custom Quantization Config

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

encoder = Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    quantization_config=bnb_config
)

References

PromptEOL:

Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, and Fuzhen Zhuang. 2024. Scaling Sentence Embeddings with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024.
PCoTEOL and KE:

Bowen Zhang, Kehua Chang, and Chunping Li. 2024. Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models. arXiv preprint arXiv:2404.03921.

Development

Clone the repository and sync dependencies:

git clone https://github.com/j341nono/llemb.git
cd llemb
uv sync --all-extras --dev

Run tests:

uv run pytest

Run static analysis:

uv run ruff check src
uv run mypy src

Project details

Release history Release notifications | RSS feed

0.3.0

Jan 29, 2026

0.2.2

Jan 28, 2026

0.2.1

Jan 20, 2026

0.2.0

Jan 20, 2026

This version

0.1.0

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llemb-0.1.0.tar.gz (7.3 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llemb-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file llemb-0.1.0.tar.gz.

File metadata

Download URL: llemb-0.1.0.tar.gz
Upload date: Jan 20, 2026
Size: 7.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llemb-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ce829d8b4c2dfe2b4ea0adb32708c5c2885a9a3e381b6e2ff232ec18b1a678d1`
MD5	`16126c7c8f9dc738dab7a5346d502d4f`
BLAKE2b-256	`d55d037ed73ccbb96986e3618c3e1d2f84b8ab0b8a4b2c8223c4e1c33c606848`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llemb-0.1.0.tar.gz:

Publisher: publish.yml on j341nono/llemb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llemb-0.1.0.tar.gz
- Subject digest: ce829d8b4c2dfe2b4ea0adb32708c5c2885a9a3e381b6e2ff232ec18b1a678d1
- Sigstore transparency entry: 836125591
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: j341nono/llemb@3d57ed79268f9881e61a9b0b8b8fec4916d7397e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/j341nono
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3d57ed79268f9881e61a9b0b8b8fec4916d7397e
- Trigger Event: release

File details

Details for the file llemb-0.1.0-py3-none-any.whl.

File metadata

Download URL: llemb-0.1.0-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llemb-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70926555fd125f2bc2be14e67ad6921ff44c05c3c91632dbfcec89f6666b6db6`
MD5	`c4482a5b44cf672f937141b5a5001386`
BLAKE2b-256	`6b5982dc80cf3c11b3f8dc1b4456856b34b12fcb43c91064d57b4b53de81b6b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llemb-0.1.0-py3-none-any.whl:

Publisher: publish.yml on j341nono/llemb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llemb-0.1.0-py3-none-any.whl
- Subject digest: 70926555fd125f2bc2be14e67ad6921ff44c05c3c91632dbfcec89f6666b6db6
- Sigstore transparency entry: 836125631
- Sigstore integration time: Jan 20, 2026
Source repository:
- Permalink: j341nono/llemb@3d57ed79268f9881e61a9b0b8b8fec4916d7397e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/j341nono
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3d57ed79268f9881e61a9b0b8b8fec4916d7397e
- Trigger Event: release

llemb 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

llemb

Features

Installation

Quick Start

Advanced Usage

Transformers Backend Configuration

References

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance