Embedding extractor for decoder-only LLMs
Project description
llemb: Unified Embedding Extraction from Decoder-only LLMs
llemb is a lightweight framework designed to extract high-quality sentence embeddings from Decoder-only Large Language Models (LLMs) like Llama, Mistral, and others. It unifies various state-of-the-art pooling strategies and efficiency optimizations into a simple, coherent interface.
With llemb, you can easily leverage powerful LLMs for embedding tasks using advanced techniques like PromptEOL and PCoTEOL, with built-in support for quantization to run on consumer hardware.
Features
- Flexible Backends: Seamless support for Hugging Face Transformers.
- Advanced Pooling Strategies:
- Standard:
mean,last_token,eos_token - Research-grade:
prompt_eol,pcoteol(Pretended Chain of Thought),ke(Knowledge Enhancement)
- Standard:
- Efficient Inference: Native support for 4-bit and 8-bit quantization via
bitsandbytes. - Granular Control: Extract embeddings from any layer (defaults to recommended layers based on research).
Installation
Install via PyPI using pip or uv.
Basic Installation
pip install llemb
# or
uv add llemb
With Quantization Support
To enable 4-bit/8-bit quantization (recommended for large models):
pip install "llemb[quantization]"
# or
uv add llemb[quantization]
Getting Started
Initialize the encoder and start extracting embeddings in just a few lines of code.
Basic Usage
import llemb
# 1. Initialize the encoder (defaults to auto-device detection)
enc = llemb.Encoder("meta-llama/Llama-3.1-8B")
# 2. Extract embeddings using mean pooling
embeddings = enc.encode("Hello world", pooling="mean")
print(embeddings.shape)
# => (1, 4096)
Advanced Usage (Quantization & Research Strategies)
Use quantization to reduce memory usage and apply advanced pooling strategies like pcoteol for better representation.
import llemb
# Initialize with 4-bit quantization and force CUDA
enc = llemb.Encoder(
model_name="meta-llama/Llama-3.1-8B",
backend="transformers",
device="cuda",
quantization="4bit"
)
# Extract using "Pretended Chain of Thought" strategy
# Note: Automatically uses the second-to-last layer (layer -2) as recommended
embeddings = enc.encode("Hello world", pooling="pcoteol")
Configuration & Optimization
llemb passes arguments directly to the backend, allowing for deep customization.
Using Flash Attention 2
import torch
encoder = llemb.Encoder(
model_name="meta-llama/Llama-3.1-8B",
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16
)
Custom Quantization Config
from transformers import BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
encoder = llemb.Encoder(
model_name="meta-llama/Llama-3.1-8B",
quantization_config=bnb_config
)
Supported Pooling Strategies
| Strategy | Description | Recommended Layer |
|---|---|---|
mean |
Average pooling of all tokens (excluding padding). | -1 (Last) |
last_token |
Vector of the last generated token. | -1 (Last) |
eos_token |
Vector corresponding to the EOS token position. | -1 (Last) |
prompt_eol |
Embeddings extracted using a prompt template targeting the last token. | -1 (Last) |
pcoteol |
"Pretended Chain of Thought" - wraps input in a reasoning template. | -2 |
ke |
"Knowledge Enhancement" - wraps input in a context-aware template. | -2 |
Development
Clone the repository and sync dependencies using uv:
git clone [https://github.com/j341nono/llemb.git](https://github.com/j341nono/llemb.git)
cd llemb
uv sync --all-extras --dev
Run Tests
uv run pytest
Static Analysis
uv run ruff check src
uv run mypy src
Citations
If you use the advanced pooling strategies implemented in this library, please cite the respective original papers:
PromptEOL:
@inproceedings{jiang-etal-2024-scaling,
title = "Scaling Sentence Embeddings with Large Language Models",
author = "Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
year = "2024"
}
PCoTEOL and KE:
@article{zhang2024simple,
title={Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models},
author={Zhang, Bowen and Chang, Kehua and Li, Chunping},
journal={arXiv preprint arXiv:2404.03921},
year={2024}
}
License
This project is open source and available under the Apache-2.0 license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llemb-0.2.1.tar.gz.
File metadata
- Download URL: llemb-0.2.1.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fec10cf4780b728634d36cf52af36eb6af673472c227bf7e94a47079acfef3c
|
|
| MD5 |
21087ec6b0a468be5bb104dcfc989fbe
|
|
| BLAKE2b-256 |
ed6357115a0af29c8774ad6979a8bcdc2a26e025706e55ff59489bcd3d79d4d1
|
File details
Details for the file llemb-0.2.1-py3-none-any.whl.
File metadata
- Download URL: llemb-0.2.1-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8896517dbdb331dcee71d8532136d4db2cf14ed90a25e2adbd2024260a1985c
|
|
| MD5 |
83f99d3a7ffa849664fcd4acbd1c3acd
|
|
| BLAKE2b-256 |
b55210e0ed5d45b57f03bb3be658bca86667d869bfcdb38589a6548850f6b8ca
|