Skip to main content

A stable, fast and easy-to-use inference library with a focus on a sync-to-async API

Project description

embed

A stable, blazing fast and easy-to-use inference library with a focus on a sync-to-async API

ci Downloads

Installation

pip install embed

Why embed?

Embed makes it easy to load any embedding, classification and reranking models from Huggingface. It leverages Infinity as backend for async computation, batching, and Flash-Attention-2.

CPU Benchmark Diagram Benchmarking on an Nvidia-L4 instance. Note: CPU uses bert-small, CUDA uses Bert-large. Methodology.

from embed import BatchedInference
from concurrent.futures import Future

# Run any model
register = BatchedInference(
    model_id=[
        # sentence-embeddings
        "michaelfeil/bge-small-en-v1.5",
        # sentence-embeddings and image-embeddings
        "jinaai/jina-clip-v1",
        # classification models
        "philschmid/tiny-bert-sst2-distilled",
        # rerankers
        "mixedbread-ai/mxbai-rerank-xsmall-v1",
    ],
    # engine to `torch` or `optimum`
    engine="torch",
    # device `cuda` (Nvidia/AMD) or `cpu`
    device="cpu",
)

sentences = ["Paris is in France.", "Berlin is in Germany.", "A image of two cats."]
images = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
question = "Where is Paris?"

future: "Future" = register.embed(
    sentences=sentences, model_id="michaelfeil/bge-small-en-v1.5"
)
future.result()
register.rerank(
    query=question, docs=sentences, model_id="mixedbread-ai/mxbai-rerank-xsmall-v1"
)
register.classify(model_id="philschmid/tiny-bert-sst2-distilled", sentences=sentences)
register.image_embed(model_id="jinaai/jina-clip-v1", images=images)

# manually stop the register upon termination to free model memory.
register.stop()

All functions return Futures(vector_embedding, token_usage), enables you to wait for them and removes batching logic from your code.

>>> embedding_fut = register.embed(sentences=sentences, model_id="michaelfeil/bge-small-en-v1.5")
>>> print(embedding_fut)
<Future at 0x7fa0e97e8a60 state=pending>
>>> time.sleep(1) and print(embedding_fut)
<Future at 0x7fa0e97e9c30 state=finished returned tuple>
>>> embedding_fut.result()
([array([-3.35943862e-03, ..., -3.22808176e-02], dtype=float32)], 19)

Licence and Contributions

embed is licensed as MIT. All contribrutions need to adhere to the MIT License. Contributions are welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed-0.3.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

embed-0.3.0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file embed-0.3.0.tar.gz.

File metadata

  • Download URL: embed-0.3.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.0rc1 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for embed-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bd6c88f220c41125842d57a0d80279c944b097e9333bb1f891dab7118870c38d
MD5 6034620bc07d1b97dd976c8dd9377a8c
BLAKE2b-256 4998c5face22698b98382999c90ed1a583cc738759056767caa5099cd361fbe4

See more details on using hashes here.

File details

Details for the file embed-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: embed-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.0rc1 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for embed-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6cd08ba00e69a2c84d101a5550a5d66fb45e06c292b606cb6a8fbb3f30e3beaf
MD5 183b065128e43e3568d7a910bbd03ed8
BLAKE2b-256 99ab50a69429cd643732d206cc822439f583985378e3a43c40480e2b357596c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page