Skip to main content

Embedding extractor for decoder-only LLMs

Project description

PyPI - Python Version PyPI - Package Version License

llemb: Unified Embedding Extraction from Decoder-only LLMs

llemb is a lightweight framework designed to extract high-quality sentence embeddings from Decoder-only Large Language Models (LLMs) like Llama, Mistral, and others. It unifies various state-of-the-art pooling strategies and efficiency optimizations into a simple, coherent interface.

With llemb, you can easily leverage powerful LLMs for embedding tasks using advanced techniques like PromptEOL and PCoTEOL, with built-in support for quantization to run on consumer hardware.

Features

  • Flexible Backends: Seamless support for Hugging Face Transformers.
  • Advanced Pooling Strategies:
    • Standard: mean, last_token, eos_token
    • Research-grade: prompt_eol, pcoteol (Pretended Chain of Thought), ke (Knowledge Enhancement)
  • Efficient Inference: Native support for 4-bit and 8-bit quantization via bitsandbytes.
  • Granular Control: Extract embeddings from any layer (defaults to recommended layers based on research).

Installation

Install via PyPI using pip or uv.

Basic Installation

pip install llemb
# or
uv add llemb

With Quantization Support

To enable 4-bit/8-bit quantization (recommended for large models):

pip install "llemb[quantization]"
# or
uv add llemb[quantization]

Getting Started

Initialize the encoder and start extracting embeddings in just a few lines of code.

Basic Usage

import llemb

# 1. Initialize the encoder (defaults to auto-device detection)
enc = llemb.Encoder("meta-llama/Llama-3.1-8B")

# 2. Extract embeddings using mean pooling
embeddings = enc.encode("Hello world", pooling="mean")

print(embeddings.shape)
# => (1, 4096)

Advanced Usage (Quantization & Research Strategies)

Use quantization to reduce memory usage and apply advanced pooling strategies like pcoteol for better representation.

import llemb

# Initialize with 4-bit quantization and force CUDA
enc = llemb.Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    device="cuda",
    quantization="4bit"
)

# Extract using "Pretended Chain of Thought" strategy
# Note: Automatically uses the second-to-last layer (layer -2) as recommended
embeddings = enc.encode("Hello world", pooling="pcoteol")

Configuration & Optimization

llemb passes arguments directly to the backend, allowing for deep customization.

Using Flash Attention 2

import torch

encoder = llemb.Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16
)

Custom Quantization Config

from transformers import BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

encoder = llemb.Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    quantization_config=bnb_config
)

Supported Pooling Strategies

Strategy Description Recommended Layer
mean Average pooling of all tokens (excluding padding). -1 (Last)
last_token Vector of the last generated token. -1 (Last)
eos_token Vector corresponding to the EOS token position. -1 (Last)
prompt_eol Embeddings extracted using a prompt template targeting the last token. -1 (Last)
pcoteol "Pretended Chain of Thought" - wraps input in a reasoning template. -2
ke "Knowledge Enhancement" - wraps input in a context-aware template. -2

Development

Clone the repository and sync dependencies using uv:

git clone [https://github.com/j341nono/llemb.git](https://github.com/j341nono/llemb.git)
cd llemb
uv sync --all-extras --dev

Run Tests

uv run pytest

Static Analysis

uv run ruff check src
uv run mypy src

Citations

If you use the advanced pooling strategies implemented in this library, please cite the respective original papers:

PromptEOL:

@inproceedings{jiang-etal-2024-scaling,
    title = "Scaling Sentence Embeddings with Large Language Models",
    author = "Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    year = "2024"
}

PCoTEOL and KE:

@article{zhang2024simple,
    title={Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models},
    author={Zhang, Bowen and Chang, Kehua and Li, Chunping},
    journal={arXiv preprint arXiv:2404.03921},
    year={2024}
}

License

This project is open source and available under the Apache-2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llemb-0.2.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llemb-0.2.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file llemb-0.2.1.tar.gz.

File metadata

  • Download URL: llemb-0.2.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llemb-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9fec10cf4780b728634d36cf52af36eb6af673472c227bf7e94a47079acfef3c
MD5 21087ec6b0a468be5bb104dcfc989fbe
BLAKE2b-256 ed6357115a0af29c8774ad6979a8bcdc2a26e025706e55ff59489bcd3d79d4d1

See more details on using hashes here.

File details

Details for the file llemb-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: llemb-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llemb-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8896517dbdb331dcee71d8532136d4db2cf14ed90a25e2adbd2024260a1985c
MD5 83f99d3a7ffa849664fcd4acbd1c3acd
BLAKE2b-256 b55210e0ed5d45b57f03bb3be658bca86667d869bfcdb38589a6548850f6b8ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page