🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)
Project description
🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI
[!WARNING] This is still very at a very early stage and subject to major changes.
Features
- 🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
- 🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using
transformers.pipeline
- 📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former
google-cloud-aiplatform
SDK for custom models - 🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to
transformers
- 🌅 Support for
diffusers
models too! - 🔍 Includes custom
logging
messages for better monitoring and debugging via Google Cloud Logging
Get started
Install the gcloud
CLI and authenticate with your Google Cloud account as:
gcloud init
gcloud auth login
Then install vertex-ai-huggingface-inference-toolkit
via pip install
:
pip install vertex-ai-huggingface-inference-toolkit>=0.0.2
Or via uv pip install
for faster installations using uv
:
uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2
Example
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
framework="torch",
framework_version="2.2.0",
transformers_version="4.38.2",
python_version="3.10",
cuda_version="12.3.0",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
model.deploy(
machine_type="n1-standard-4",
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)
Once deployed we can send request to it via cURL
:
curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict
Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
framework="torch",
framework_version="2.1.0",
python_version="3.9",
cuda_version="11.8.0",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
Example on running on existing Docker image
To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit
in advance.
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
model_name_or_path="facebook/bart-large-mnli",
image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
environment_variables={
"HF_TASK": "zero-shot-classification",
},
)
Example on running TinyLlama for `text-generation`
from vertex_ai_huggingface_inference_toolkit import TransformersModel
model = TransformersModel(
project_id="my-project",
location="us-east1",
model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
extra_requirements=["flash-attn --no-build-isolation"],
environment_variables={
"HF_TASK": "text-generation",
},
)
References / Acknowledgements
This work is heavily inspired by sagemaker-huggingface-inference-toolkit
early work from Philipp Schmid, Hugging Face, and Amazon Web Services.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
.
File metadata
- Download URL: vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71690ea3c9a4284a1270cff6004464e14d7bc086db69598e8b2b50c8fd8c1da3 |
|
MD5 | aec7a0cceef1662b7d058a71d6a34223 |
|
BLAKE2b-256 | e69ab3dd7e3a327032a249877371ab693511a1233c69531c67aecab15a6dc485 |
File details
Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fbe588a3ffa7684425ce36d380004b365787c5d2b098e777b74132b900ebd68 |
|
MD5 | 7b0018092f0f6802c850b2b96ceb79db |
|
BLAKE2b-256 | 0334b53c849df2c82357629976be65ce80a90ac230a9f985f56929e94ccd11d4 |