Embedding extractor for decoder-only LLMs
Project description
llemb
Unified embedding extraction from Decoder-only LLMs.
Features
- Backends: Support for Hugging Face Transformers.
- Pooling Strategies:
mean: Average pooling of all tokens (excluding padding).last_token: Vector of the last token.eos_token: Vector corresponding to the EOS token position.prompt_eol: Embeddings extracted using a prompt template targeting the last token.pcoteol: "Pretended Chain of Thought" - wraps input in a reasoning template.ke: "Knowledge Enhancement" - wraps input in a context-aware template.
- Quantization: Support for 4-bit and 8-bit quantization via
bitsandbytes. - Layer Selection: Extract embeddings from any layer.
- Defaults to -1 (last layer) for standard strategies.
- Defaults to -2 (second-to-last layer) for
pcoteolandke(as recommended by research).
Installation
Install using uv:
uv add llemb
To include quantization support:
uv add llemb[quantization]
Quick Start
Initialize the encoder with minimal setup (defaults to transformers, no quantization, cpu/cuda auto-detect):
import llemb
# Minimal setup
enc = llemb.Encoder("meta-llama/Llama-3.1-8B")
# Extract embeddings
embeddings = enc.encode("Hello world", pooling="mean")
print(embeddings.shape)
Advanced Usage
Initialize with specific options:
import llemb
# Initialize encoder with specific backend and configuration
enc = llemb.Encoder(
model_name="meta-llama/Llama-3.1-8B",
backend="transformers",
device="cuda", # Force CUDA
quantization="4bit" # Use 4-bit quantization
)
# Extract embeddings using pcoteol strategy (automatically uses layer -2)
embeddings = enc.encode("Hello world", pooling="pcoteol")
Transformers Backend Configuration
When using the transformers backend, you can pass standard Hugging Face AutoModel arguments directly to the Encoder.
Example 1: Using Flash Attention 2
import torch
encoder = Encoder(
model_name="meta-llama/Llama-3.1-8B",
backend="transformers",
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16
)
Example 2: Custom Quantization Config
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
encoder = Encoder(
model_name="meta-llama/Llama-3.1-8B",
backend="transformers",
quantization_config=bnb_config
)
References
-
PromptEOL:
Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, and Fuzhen Zhuang. 2024. Scaling Sentence Embeddings with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024.
-
PCoTEOL and KE:
Bowen Zhang, Kehua Chang, and Chunping Li. 2024. Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models. arXiv preprint arXiv:2404.03921.
Development
Clone the repository and sync dependencies:
git clone https://github.com/j341nono/llemb.git
cd llemb
uv sync --all-extras --dev
Run tests:
uv run pytest
Run static analysis:
uv run ruff check src
uv run mypy src
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llemb-0.1.0.tar.gz.
File metadata
- Download URL: llemb-0.1.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce829d8b4c2dfe2b4ea0adb32708c5c2885a9a3e381b6e2ff232ec18b1a678d1
|
|
| MD5 |
16126c7c8f9dc738dab7a5346d502d4f
|
|
| BLAKE2b-256 |
d55d037ed73ccbb96986e3618c3e1d2f84b8ab0b8a4b2c8223c4e1c33c606848
|
Provenance
The following attestation bundles were made for llemb-0.1.0.tar.gz:
Publisher:
publish.yml on j341nono/llemb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llemb-0.1.0.tar.gz -
Subject digest:
ce829d8b4c2dfe2b4ea0adb32708c5c2885a9a3e381b6e2ff232ec18b1a678d1 - Sigstore transparency entry: 836125591
- Sigstore integration time:
-
Permalink:
j341nono/llemb@3d57ed79268f9881e61a9b0b8b8fec4916d7397e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/j341nono
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d57ed79268f9881e61a9b0b8b8fec4916d7397e -
Trigger Event:
release
-
Statement type:
File details
Details for the file llemb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llemb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70926555fd125f2bc2be14e67ad6921ff44c05c3c91632dbfcec89f6666b6db6
|
|
| MD5 |
c4482a5b44cf672f937141b5a5001386
|
|
| BLAKE2b-256 |
6b5982dc80cf3c11b3f8dc1b4456856b34b12fcb43c91064d57b4b53de81b6b8
|
Provenance
The following attestation bundles were made for llemb-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on j341nono/llemb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llemb-0.1.0-py3-none-any.whl -
Subject digest:
70926555fd125f2bc2be14e67ad6921ff44c05c3c91632dbfcec89f6666b6db6 - Sigstore transparency entry: 836125631
- Sigstore integration time:
-
Permalink:
j341nono/llemb@3d57ed79268f9881e61a9b0b8b8fec4916d7397e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/j341nono
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d57ed79268f9881e61a9b0b8b8fec4916d7397e -
Trigger Event:
release
-
Statement type: