Skip to main content

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)

Project description

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI

[!WARNING] This is still very at a very early stage and subject to major changes.

Features

  • 🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
  • 🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using transformers.pipeline
  • 📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former google-cloud-aiplatform SDK for custom models
  • 🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to transformers
  • 🌅 Support for diffusers models too!
  • 🔍 Includes custom logging messages for better monitoring and debugging via Google Cloud Logging

Get started

Install the gcloud CLI and authenticate with your Google Cloud account as:

gcloud init
gcloud auth login

Then install vertex-ai-huggingface-inference-toolkit via pip install:

pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Or via uv pip install for faster installations using uv:

uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Example

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.2.0",
    transformers_version="4.38.2",
    python_version="3.10",
    cuda_version="12.3.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
model.deploy(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)

Once deployed we can send request to it via cURL:

curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict
Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)
from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.1.0",
    python_version="3.9",
    cuda_version="11.8.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
Example on running on existing Docker image

To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit in advance.

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
Example on running TinyLlama for `text-generation`
from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    project_id="my-project",
    location="us-east1",
    model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
    extra_requirements=["flash-attn --no-build-isolation"],
    environment_variables={
        "HF_TASK": "text-generation",
    },
)

References / Acknowledgements

This work is heavily inspired by sagemaker-huggingface-inference-toolkit early work from Philipp Schmid, Hugging Face, and Amazon Web Services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz.

File metadata

File hashes

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
Algorithm Hash digest
SHA256 71690ea3c9a4284a1270cff6004464e14d7bc086db69598e8b2b50c8fd8c1da3
MD5 aec7a0cceef1662b7d058a71d6a34223
BLAKE2b-256 e69ab3dd7e3a327032a249877371ab693511a1233c69531c67aecab15a6dc485

See more details on using hashes here.

File details

Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1fbe588a3ffa7684425ce36d380004b365787c5d2b098e777b74132b900ebd68
MD5 7b0018092f0f6802c850b2b96ceb79db
BLAKE2b-256 0334b53c849df2c82357629976be65ce80a90ac230a9f985f56929e94ccd11d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page