A Python wrapper around TGI and TEI servers
Project description
Py-TGI (Py-TXI at this point xD)
Py-TGI is a Python wrapper around Text-Generation-Inference and Text-Embedding-Inference that enables creating and running TGI/TEI instances through the awesome docker-py
in a similar style to Transformers API.
Installation
pip install py-tgi
Py-TGI is designed to be used in a similar way to Transformers API. We use docker-py
(instead of a dirty subprocess
solution) so that the containers you run are linked to the main process and are stopped automatically when your code finishes or fails.
Usage
Here's an example of how to use it:
from py_tgi import TGI, is_nvidia_system, is_rocm_system
llm = TGI(
model="NousResearch/Llama-2-7b-hf",
devices=["/dev/kfd", "/dev/dri"] if is_rocm_system() else None,
gpus="all" if is_nvidia_system() else None,
)
output = llm.generate(["Hi, I'm a language model", "I'm fine, how are you?"])
print(output)
Output: [" and I'm here to help you with any questions you have. What can I help you with", "\nUser 0: I'm doing well, thanks for asking. I'm just a"]
from py_tgi import TEI, is_nvidia_system
embed = TEI(
model="BAAI/bge-large-en-v1.5",
dtype="float16",
pooling="mean",
gpus="all" if is_nvidia_system() else None,
)
output = embed.encode(["Hi, I'm an embedding model", "I'm fine, how are you?"])
print(output)
Output: [array([[ 0.01058742, -0.01588806, -0.03487622, ..., -0.01613717, 0.01772875, -0.02237891]], dtype=float32), array([[ 0.02815401, -0.02892136, -0.0536355 , ..., 0.01225784, -0.00241452, -0.02836569]], dtype=float32)]
That's it! Now you can write your Python scripts using the power of TGI and TEI without having to worry about the underlying Docker containers.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.