A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
Project description
embed
A stable, blazing fast and easy-to-use inference library with a focus on a sync-to-async API
Installation
pip install embed
Why embed?
Embed makes it easy to load any embedding, classification and reranking models from Huggingface. It leverages Infinity as backend for async computation, batching, and Flash-Attention-2.
Benchmarking on an Nvidia-L4 instance. Note: CPU uses bert-small, CUDA uses Bert-large. Methodology.
from embed import BatchedInference
from concurrent.futures import Future
# Run any model
register = BatchedInference(
model_id=[
# sentence-embeddings
"michaelfeil/bge-small-en-v1.5",
# sentence-embeddings and image-embeddings
"jinaai/jina-clip-v1",
# classification models
"philschmid/tiny-bert-sst2-distilled",
# rerankers
"mixedbread-ai/mxbai-rerank-xsmall-v1",
],
# engine to `torch` or `optimum`
engine="torch",
# device `cuda` (Nvidia/AMD) or `cpu`
device="cpu",
)
sentences = ["Paris is in France.", "Berlin is in Germany.", "A image of two cats."]
images = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
question = "Where is Paris?"
future: "Future" = register.embed(
sentences=sentences, model_id="michaelfeil/bge-small-en-v1.5"
)
future.result()
register.rerank(
query=question, docs=sentences, model_id="mixedbread-ai/mxbai-rerank-xsmall-v1"
)
register.classify(model_id="philschmid/tiny-bert-sst2-distilled", sentences=sentences)
register.image_embed(model_id="jinaai/jina-clip-v1", images=images)
# manually stop the register upon termination to free model memory.
register.stop()
All functions return Futures(vector_embedding, token_usage)
, enables you to wait
for them and removes batching logic from your code.
>>> embedding_fut = register.embed(sentences=sentences, model_id="michaelfeil/bge-small-en-v1.5")
>>> print(embedding_fut)
<Future at 0x7fa0e97e8a60 state=pending>
>>> time.sleep(1) and print(embedding_fut)
<Future at 0x7fa0e97e9c30 state=finished returned tuple>
>>> embedding_fut.result()
([array([-3.35943862e-03, ..., -3.22808176e-02], dtype=float32)], 19)
Licence and Contributions
embed is licensed as MIT. All contribrutions need to adhere to the MIT License. Contributions are welcome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file embed-0.3.0.tar.gz
.
File metadata
- Download URL: embed-0.3.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.0rc1 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd6c88f220c41125842d57a0d80279c944b097e9333bb1f891dab7118870c38d |
|
MD5 | 6034620bc07d1b97dd976c8dd9377a8c |
|
BLAKE2b-256 | 4998c5face22698b98382999c90ed1a583cc738759056767caa5099cd361fbe4 |
File details
Details for the file embed-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: embed-0.3.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.0rc1 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cd08ba00e69a2c84d101a5550a5d66fb45e06c292b606cb6a8fbb3f30e3beaf |
|
MD5 | 183b065128e43e3568d7a910bbd03ed8 |
|
BLAKE2b-256 | 99ab50a69429cd643732d206cc822439f583985378e3a43c40480e2b357596c5 |