Skip to main content

Embedding extractor for decoder-only LLMs

Project description

llemb

Unified embedding extraction from Decoder-only LLMs.

Features

  • Backends: Support for Hugging Face Transformers.
  • Pooling Strategies:
    • mean: Average pooling of all tokens (excluding padding).
    • last_token: Vector of the last token.
    • eos_token: Vector corresponding to the EOS token position.
    • prompt_eol: Embeddings extracted using a prompt template targeting the last token.
    • pcoteol: "Pretended Chain of Thought" - wraps input in a reasoning template.
    • ke: "Knowledge Enhancement" - wraps input in a context-aware template.
  • Quantization: Support for 4-bit and 8-bit quantization via bitsandbytes.
  • Layer Selection: Extract embeddings from any layer.
    • Defaults to -1 (last layer) for standard strategies.
    • Defaults to -2 (second-to-last layer) for pcoteol and ke (as recommended by research).

Installation

Install using uv:

uv add llemb

To include quantization support:

uv add llemb[quantization]

Quick Start

Initialize the encoder with minimal setup (defaults to transformers, no quantization, cpu/cuda auto-detect):

import llemb

# Minimal setup
enc = llemb.Encoder("meta-llama/Llama-3.1-8B")

# Extract embeddings
embeddings = enc.encode("Hello world", pooling="mean")
print(embeddings.shape)

Advanced Usage

Initialize with specific options:

import llemb

# Initialize encoder with specific backend and configuration
enc = llemb.Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    device="cuda", # Force CUDA
    quantization="4bit" # Use 4-bit quantization
)

# Extract embeddings using pcoteol strategy (automatically uses layer -2)
embeddings = enc.encode("Hello world", pooling="pcoteol")

Transformers Backend Configuration

When using the transformers backend, you can pass standard Hugging Face AutoModel arguments directly to the Encoder.

Example 1: Using Flash Attention 2

import torch

encoder = Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16
)

Example 2: Custom Quantization Config

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

encoder = Encoder(
    model_name="meta-llama/Llama-3.1-8B",
    backend="transformers",
    quantization_config=bnb_config
)

References

  • PromptEOL:

    Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, and Fuzhen Zhuang. 2024. Scaling Sentence Embeddings with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024.

  • PCoTEOL and KE:

    Bowen Zhang, Kehua Chang, and Chunping Li. 2024. Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models. arXiv preprint arXiv:2404.03921.

Development

Clone the repository and sync dependencies:

git clone https://github.com/j341nono/llemb.git
cd llemb
uv sync --all-extras --dev

Run tests:

uv run pytest

Run static analysis:

uv run ruff check src
uv run mypy src

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llemb-0.1.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llemb-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file llemb-0.1.0.tar.gz.

File metadata

  • Download URL: llemb-0.1.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llemb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce829d8b4c2dfe2b4ea0adb32708c5c2885a9a3e381b6e2ff232ec18b1a678d1
MD5 16126c7c8f9dc738dab7a5346d502d4f
BLAKE2b-256 d55d037ed73ccbb96986e3618c3e1d2f84b8ab0b8a4b2c8223c4e1c33c606848

See more details on using hashes here.

Provenance

The following attestation bundles were made for llemb-0.1.0.tar.gz:

Publisher: publish.yml on j341nono/llemb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llemb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llemb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llemb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70926555fd125f2bc2be14e67ad6921ff44c05c3c91632dbfcec89f6666b6db6
MD5 c4482a5b44cf672f937141b5a5001386
BLAKE2b-256 6b5982dc80cf3c11b3f8dc1b4456856b34b12fcb43c91064d57b4b53de81b6b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for llemb-0.1.0-py3-none-any.whl:

Publisher: publish.yml on j341nono/llemb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page