🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but unofficial)

These details have not been verified by PyPI

Project links

Project description

🤗 Hugging Face Inference Toolkit for Google Cloud Vertex AI

[!WARNING] This is still very at a very early stage and subject to major changes.

Features

🤗 Straight forward way of deploying models from the Hugging Face Hub in Vertex AI
🐳 Automatically build Custom Prediction Routines (CPR) for Hugging Face Hub models using transformers.pipeline
📦 Everything is packaged within a single method, providing more flexibility and ease of usage than the former google-cloud-aiplatform SDK for custom models
🔌 Seamless integration for running inference on top of any model from the Hugging Face Hub in Vertex AI thanks to transformers
🌅 Support for diffusers models too!
🔍 Includes custom logging messages for better monitoring and debugging via Google Cloud Logging

Get started

Install the gcloud CLI and authenticate with your Google Cloud account as:

gcloud init
gcloud auth login

Then install vertex-ai-huggingface-inference-toolkit via pip install:

pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Or via uv pip install for faster installations using uv:

uv pip install vertex-ai-huggingface-inference-toolkit>=0.0.2

Example

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.2.0",
    transformers_version="4.38.2",
    python_version="3.10",
    cuda_version="12.3.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)
model.deploy(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)

Once deployed we can send request to it via cURL:

curl -X POST -H "Content-Type: application/json" -d '{"sequences": "Messi is the GOAT", "candidate_labels": ["football", "basketball", "baseball"]}' <VERTEX_AI_ENDPOINT_URL>/predict

Example on running on different versions (`torch`, CUDA, Ubuntu, etc.)

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    framework="torch",
    framework_version="2.1.0",
    python_version="3.9",
    cuda_version="11.8.0",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)

Example on running on existing Docker image

To ensure the consistency of the following approach, the image should have been generated using vertex_ai_huggingface_inference_toolkit in advance.

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    model_name_or_path="facebook/bart-large-mnli",
    image_uri="us-east1-docker.pkg.dev/huggingface-cloud/vertex-ai-huggingface-inference-toolkit/py3.11-cu12.3.0-torch-2.2.0-transformers-4.38.2:latest",
    environment_variables={
        "HF_TASK": "zero-shot-classification",
    },
)

Example on running TinyLlama for `text-generation`

from vertex_ai_huggingface_inference_toolkit import TransformersModel

model = TransformersModel(
    project_id="my-project",
    location="us-east1",
    model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_kwargs={"torch_dtype": "float16", "attn_implementation": "flash_attention_2"},
    extra_requirements=["flash-attn --no-build-isolation"],
    environment_variables={
        "HF_TASK": "text-generation",
    },
)

References / Acknowledgements

This work is heavily inspired by sagemaker-huggingface-inference-toolkit early work from Philipp Schmid, Hugging Face, and Amazon Web Services.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.2

Mar 20, 2024

0.0.1

Mar 12, 2024

0.0.1b12 pre-release

Mar 6, 2024

0.0.1b11 pre-release

Mar 5, 2024

0.0.1b10 pre-release

Mar 4, 2024

0.0.1b9 pre-release

Mar 4, 2024

0.0.1b8 pre-release

Mar 4, 2024

0.0.1b7 pre-release

Mar 4, 2024

0.0.1b6 pre-release

Mar 4, 2024

0.0.1b5 pre-release

Mar 4, 2024

0.0.1b4 pre-release

Mar 4, 2024

0.0.1b3 pre-release

Mar 4, 2024

0.0.1b2 pre-release

Mar 4, 2024

0.0.1b1 pre-release

Mar 4, 2024

0.0.1b0 pre-release

Mar 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz (27.4 kB view details)

Uploaded Mar 20, 2024 Source

Built Distribution

vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl (43.6 kB view details)

Uploaded Mar 20, 2024 Python 3

File details

Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz.

File metadata

Download URL: vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
Upload date: Mar 20, 2024
Size: 27.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`71690ea3c9a4284a1270cff6004464e14d7bc086db69598e8b2b50c8fd8c1da3`
MD5	`aec7a0cceef1662b7d058a71d6a34223`
BLAKE2b-256	`e69ab3dd7e3a327032a249877371ab693511a1233c69531c67aecab15a6dc485`

See more details on using hashes here.

File details

Details for the file vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl.

File metadata

Download URL: vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
Upload date: Mar 20, 2024
Size: 43.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for vertex_ai_huggingface_inference_toolkit-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fbe588a3ffa7684425ce36d380004b365787c5d2b098e777b74132b900ebd68`
MD5	`7b0018092f0f6802c850b2b96ceb79db`
BLAKE2b-256	`0334b53c849df2c82357629976be65ce80a90ac230a9f985f56929e94ccd11d4`