Skip to main content

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

Project description

Optimum-AMD

🤗 Optimum-AMD is an extension to Hugging Face libraries enabling performance optimizations for ROCm for AMD GPUs and Ryzen AI for AMD NPU accelerator.

Install

Optimum-AMD library can be installed through pip:

pip install --upgrade-strategy eager optimum[amd]

Installation is possible from source as well:

git clone https://github.com/huggingface/optimum-amd.git
cd optimum-amd
pip install -e .

ROCm support for AMD GPUs

Hugging Face libraries natively support AMD GPUs through PyTorch for ROCm with zero code change.

🤗 Transformers natively supports Flash Attention 2, GPTQ quantization with ROCm. 🤗 Text Generation Inference library for LLM deployment has native ROCm support, with Flash Attention 2, Paged Attention, fused positional encoding & layer norm kernels support.

Find out more about these integrations in the documentation!

In the future, Optimum-AMD may host more ROCm-specific optimizations.

How to use it: Text Generation Inference

Text Generation Inference library for LLM deployment supports AMD Instinct MI210/MI250 GPUs. Deployment can be done as follow:

  1. Install ROCm5.7 to the host machine
  2. Example LLM server setup: launch a Falcon-7b model server on the ROCm-enabled docker.
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.2-rocm --model-id $model
  1. Client setup: Open another shell and run:
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

How to use it: ONNX Runtime with ROCm

Optimum ONNX Runtime integration supports ROCm for AMD GPUs. Usage is as follow:

  1. Install ROCm 5.7 on the host machine.
  2. Use the example Dockerfile or install onnxruntime-rocm package locally from source. Pip wheels are not available at the time.
  3. Run a BERT text classification ONNX model by using ROCMExecutionProvider:
from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

ort_model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english",
    export=True,
    provider="ROCMExecutionProvider",
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9997727274894714}]

Ryzen AI

AMD's Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app.

Optimum-AMD provides easy interface for loading and inference of Hugging Face models on Ryzen AI accelerator.

Ryzen AI Environment setup

A Ryzen AI environment needs to be enabled to use this library. Please refer to Ryzen AI's Installation and Runtime Setup.

How to use it?

  • Quantize the ONNX model with Optimum or using the RyzenAI quantization tools

For more information on quantization refer to Model Quantization guide.

  • Load model with Ryzen AI class

To load a model and run inference with RyzenAI, you can just replace your AutoModelForXxx class with the corresponding RyzenAIModelForXxx class.

import requests
from PIL import Image

- from transformers import AutoModelForImageClassification
+ from optimum.amd.ryzenai import RyzenAIModelForImageClassification
from transformers import AutoFeatureExtractor, pipeline

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_id = <path of the model>
- model = AutoModelForImageClassification.from_pretrained(model_id)
+ model = RyzenAIModelForImageClassification.from_pretrained(model_id, vaip_config=<path to config file>)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
cls_pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)
outputs = cls_pipe(image)

If you find any issue while using those, please open an issue or a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum-amd-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

optimum_amd-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file optimum-amd-0.1.0.tar.gz.

File metadata

  • Download URL: optimum-amd-0.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for optimum-amd-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3bf7d8c9a425eee7f458d7d7f4b3e511dfd6e45e0882306b79fa597a11abcd9
MD5 b7157a9c8e0f7769062427fb8de08f70
BLAKE2b-256 3e956b6e0303fe99cdd9a91806cba485a75df68398361c43ebb61c3304eff868

See more details on using hashes here.

File details

Details for the file optimum_amd-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: optimum_amd-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for optimum_amd-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b9e3030aaadca8a192a6616f3a1b48f04686d077af4304c9bb8e09ae0493d47
MD5 c452c3eee260b8036d6550ddd72e331f
BLAKE2b-256 b0baa8295c2b14a421b12970ca6c12ca19b145c611275ea6a3f990cc6a665d3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page