Skip to main content

Fast and Lightweight Text Embedding

Project description

LightEmbed

LightEmbed is a light-weight, fast, and efficient tool for generating sentence embeddings. It does not rely on heavy dependencies like PyTorch and Transformers, making it suitable for environments with limited resources.

Benefits

1. Light-weight

  • Minimal Dependencies: LightEmbed does not depend on PyTorch and Transformers.
  • Low Resource Requirements: Operates smoothly with minimal specs: 1GB RAM, 1 CPU, and no GPU required.

2. Fast (as light)

  • ONNX Runtime: Utilizes the ONNX runtime, which is significantly faster compared to Sentence Transformers that use PyTorch.

3. Consistent with Sentence Transformers

  • Consistency: Incorporates all modules from a Sentence Transformer model, including normalization and pooling.
  • Accuracy: Produces embedding vectors identical to those from Sentence Transformers.

4. Supports models not managed by LightEmbed

LightEmbed can work with any Hugging Face repository, even those not hosted on Hugging Face ONNX models, as long as ONNX files are available.

5. Local Model Support

LightEmbed can load models from the local file system, enabling faster loading times and functionality in environments without internet access, such as AWS Lambda or EC2 instances in private subnets.

Installation

pip install -U light-embed

Usage

Then you can specify the original model name like this:

from light_embed import TextEmbedding
sentences = ["This is an example sentence", "Each sentence is converted"]

model = TextEmbedding(model_name_or_path='sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

or, alternatively, you can specify the onnx model name like this:

from light_embed import TextEmbedding
sentences = ["This is an example sentence", "Each sentence is converted"]

model = TextEmbedding(model_name_or_path='onnx-models/all-MiniLM-L6-v2-onnx')
embeddings = model.encode(sentences)
print(embeddings)

Using a Non-Managed Model: To use a model from its original repository without relying on Hugging Face ONNX models, simply specify the model name and provide the model_config, assuming the original repository includes ONNX files.

from light_embed import TextEmbedding
sentences = ["This is an example sentence", "Each sentence is converted"]

model_config = {
    "model_file": "onnx/model.onnx",
    "pooling_config_path": "1_Pooling",
    "normalize": False
}
model = TextEmbedding(
    model_name_or_path='sentence-transformers/all-MiniLM-L6-v2',
    model_config=model_config
)
embeddings = model.encode(sentences)
print(embeddings)

Using a Local Model: To use a local model, specify the path to the model's folder and provide the model_config.

from light_embed import TextEmbedding
sentences = ["This is an example sentence", "Each sentence is converted"]

model_config = {
    "model_file": "onnx/model.onnx",
    "pooling_config_path": "1_Pooling",
    "normalize": False
}
model = TextEmbedding(
    model_name_or_path='/path/to/the/local/model/all-MiniLM-L6-v2-onnx',
    model_config=model_config
)
embeddings = model.encode(sentences)
print(embeddings)

Citing & Authors

Binh Nguyen / binhcode25@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

light_embed-1.0.1.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

light_embed-1.0.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file light_embed-1.0.1.tar.gz.

File metadata

  • Download URL: light_embed-1.0.1.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for light_embed-1.0.1.tar.gz
Algorithm Hash digest
SHA256 cff57706619c92b6eb0787a9a1ebc885819a10f1ab9dc1957a9834d1c66c1f4d
MD5 3d07a0f3fa11c76a050693038e097626
BLAKE2b-256 6afd33dd2071495b8112a8ae753d2b377ac1faa204ff9213e03b8c1ce6161e78

See more details on using hashes here.

File details

Details for the file light_embed-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: light_embed-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for light_embed-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 42442a558846e3246ead003c326e6b954a65a8e94629d74137904a7bdc5fad0d
MD5 46fe7029867f857e5f0a0220f6b2d145
BLAKE2b-256 026eb33c23a4fa3c14f6281d669cabfd581a7b38a56234eedc4b29e2c71aa175

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page