Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of sentence-transformer models and frameworks.
Project description
Infinity ♾️
Embedding Inference Server - finding TGI for embeddings. Infinity is developed under MIT Licence - https://github.com/michaelfeil/infinity
Why Infinity:
Infinity provides the following features:
- Deploy virtually any SentenceTransformer - deploy the model you know from SentenceTransformers
- Fast inference: The inference server is built on top of torch and ctranslate2 under the hood, getting most out of your CUDA or CPU hardware.
- Dynamic batching: New embedding requests are queued while GPU is busy with the previous ones. New requests are squeezed intro your GPU/CPU as soon as ready.
- Correct and tested implementation: Unit and end-to-end tested. Embeddings via infinity are identical to SentenceTransformers (up to numerical precision). Lets API users create embeddings till infinity and beyond.
- Easy to use: The API is built on top of FastAPI, Swagger makes it fully documented. API specs are aligned to OpenAI. See below on how to get started.
Infinity demo:
In this gif below, we use sentence-transformers/all-MiniLM-L6-v2, deployed at batch-size=2. After initialization, from a second terminal 3 requests (payload 1,1,and 5 sentences) are sent via cURL.
Getting started
Install via pip
pip install infinity-emb[all]
Install from source with Poetry
Advanced: To install via Poetry use Poetry 1.6.1, Python 3.10 on Ubuntu 22.04
git clone https://github.com/michaelfeil/infinity
cd infinity
cd libs/infinity_emb
poetry install --extras all
Launch via Python
from infinity_emb import create server
create_server()
or launch the create_server()
command via CLI
infinity_emb --help
or launch the CLI using a pre-built docker container
Get the Python
model=sentence-transformers/all-MiniLM-L6-v2
port=8080
docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port --engine ctranslate2
The download path at runtime, can be controlled via the environment variable SENTENCE_TRANSFORMERS_HOME
.
Documentation
After startup, the Swagger Ui will be available under {url}:{port}/docs
, in this case http://localhost:8080/docs
.
Contribute and Develop
Install via Poetry 1.6.1 and Python3.10 on Ubuntu 22.04
cd libs/infinity_emb
poetry install --extras all --with test
To pass the CI:
cd libs/infinity_emb
make format
make lint
poetry run pytest ./tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for infinity_emb-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee7efd69efe0772715e2ce74a0fcdfe9c8fab2499c754d18c2ae1560a60f0db7 |
|
MD5 | f9a99b02aecac77d1cc5df3d71da94ea |
|
BLAKE2b-256 | d520ef01b2446bda5b9fe7ae7a760c69e294d4b664394bd362bb34254c66f4bb |