Embeddings til infinity
Project description
Infinity ♾️
Embedding Inference Server - finding TGI for embeddings. Infinity is developed under MIT Licence - https://github.com/michaelfeil/infinity
[
]
[
]
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
Why Infinity:
Infinity provides the following features:
- Deploy virtually any SentenceTransformer - deploy the model you know from SentenceTransformers
- Fast inference: The inference server is built on top of torch and ctranslate2 under the hood, getting most out of your CUDA or CPU hardware.
- Dynamic batching: New embedding requests are queued while GPU is busy with the previous ones. New requests are squeezed intro your GPU/CPU as soon as ready.
- Correct and tested implementation: Unit and end-to-end tested. Embeddings via infinity are identical to SentenceTransformers (up to numerical precision). Lets API users create embeddings till infinity and beyond.
- Easy to use: The API is built on top of FastAPI, Swagger makes it fully documented. API specs are aligned to OpenAI. See below on how to get started.
Demo:
With infinity we can launch any SentenceTransformer model via the API.
In this gif below, we use sentence-transformers/all-MiniLM-L6-v2, deployed at batch-size=2. After initialization, from a second terminal 3 requests (payload 1,1,and 5 sentences) are sent via cURL.
Getting started
Install via Poetry and Python
cd libs/infinity_emb
poetry install --extras all
Launch via Python
from infinity_emb import create server
create_server()
or launch the create_server()
command via CLI
infinity_emb --help
or launch the CLI using a pre-built docker container
Get the Python
model=sentence-transformers/all-MiniLM-L6-v2
port=8080
docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port --engine ctranslate2
The download path at runtime, can be controlled via the environment variable SENTENCE_TRANSFORMERS_HOME
.
Documentation
After startup, the Swagger Ui will be available under {url}:{port}/docs
, in this case http://localhost:8080/docs
.
Contribute and Develop
Install via Poetry 1.6.1 and Python3.10 on Ubuntu 22.04
cd libs/infinity_emb
poetry install --extras all --with test
To pass the CI:
cd libs/infinity_emb
make format
make lint
poetry run pytest ./tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for infinity_emb-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f290dde705ca9653a1b0f798bf6761df5b3688181ab877d6b08d13485c56dd9 |
|
MD5 | b5d1f9908b298ebf5a241e788d78ce67 |
|
BLAKE2b-256 | ad1a698d014110ac9fd3e86344d8d7cd3e57689f67364cc8a12906f82a3834c9 |