Open Source Text Embedding Models with OpenAI API-Compatible Endpoint
Project description
Open Source Text Embedding Models with OpenAI API-Compatible Endpoint
Many open source projects support the compatibility of the completions
and the chat/completions
endpoints of the OpenAI API, but do not support the embeddings
endpoint.
The goal of this project is to create an OpenAI API-compatible version of the embeddings
endpoint, which serves open source sentence-transformers models and other models supported by the LangChain's HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings and HuggingFaceBgeEmbeddings class.
Supported Text Embeddings Models
Below is a compilation of open-source models that are tested via the embeddings
endpoint:
- BAAI/bge-large-en
- intfloat/e5-large-v2
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-mpnet-base-v2
- universal-sentence-encoder-large/5 (Please refer to the
universal_sentence_encoder
branch for more details)
The models mentioned above have undergone personal testing and verification. It is worth noting that all sentence-transformers models are expected to perform seamlessly with the endpoint.
Standalone FastAPI Server
To run the embeddings endpoint locally as a standalone FastAPI server, follow these steps:
-
Install the dependencies by executing the following commands:
pip install --no-cache-dir open-text-embeddings[server]
-
Run the server with the desired model using the following command which enabled normalize embeddings (Omit the
NORMALIZE_EMBEDDINGS
if the model don't support normalize embeddings):MODEL=intfloat/e5-large-v2 NORMALIZE_EMBEDDINGS=1 python -m open.text.embeddings.server
If a GPU is detected in the runtime environment, the server will automatically execute using the
cuba
mode. However, you have the flexibility to specify theDEVICE
environment variable to choose betweencpu
andcuba
. Here's an example of how to run the server with your desired configuration:MODEL=intfloat/e5-large-v2 NORMALIZE_EMBEDDINGS=1 DEVICE=cpu python -m open.text.embeddings.server
This setup allows you to seamlessly switch between CPU and GPU modes, giving you control over the server's performance based on your specific requirements.
-
You will see the following text from your console once the server has started:
INFO: Started server process [19705] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
AWS Lambda Function
To deploy the embeddings endpoint as an AWS Lambda Function using GitHub Actions, follow these steps:
-
Fork the repo.
-
Add your AWS credentials (
AWS_KEY
andAWS_SECRET
) to the repository secrets. You can do this by navigating to https://github.com/username/open-text-embeddings/settings/secrets/actions. -
Manually trigger the
Deploy Dev
orRemove Dev
GitHub Actions to deploy or remove the AWS Lambda Function.
Testing the Embeddings Endpoint
To test the embeddings endpoint, the repository includes an embeddings.ipynb notebook with a LangChain-compatible OpenAIEmbeddings
class.
To get started:
-
Install the dependencies by executing the following command:
pip install --no-cache-dir open-text-embeddings openai tiktoken
-
Execute the cells in the notebook to test the embeddings endpoint.
Contributions
Thank you very much for the following contributions:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for open_text_embeddings-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | caff2b7c25199d8bdc1326b19e8170e7567a6de25c01646437c6dd736f4bc96e |
|
MD5 | 4bf6cfea2957eddc51470fa39cdbfbcc |
|
BLAKE2b-256 | 28eb8c3a8771f3c499795ddff34ca390c08e65bc0027a73c4cbf45b7683aa087 |