Optimum Nvidia is the interface between the Hugging Face Transformers and NVIDIA GPUs. "

These details have not been verified by PyPI

Project links

Project description

Optimum-NVIDIA

Optimized inference with NVIDIA and Hugging Face

Optimum-NVIDIA delivers the best inference performance on the NVIDIA platform through Hugging Face. Run LLaMA 2 at 1,200 tokens/second (up to 28x faster than the framework) by changing just a single line in your existing transformers code.

Installation

Pip

Pip installation flow has been validated on Ubuntu only at this stage.

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
python -m pip install --pre --extra-index-url https://pypi.nvidia.com optimum-nvidia

For developers who want to target the best performances, please look at the installation methods below.

Docker container

You can use a Docker container to try Optimum-NVIDIA today. Images are available on the Hugging Face Docker Hub.

docker pull huggingface/optimum-nvidia

Building from source

Instead of using the pre-built docker container, you can build Optimum-NVIDIA from source:

TARGET_SM="90-real;89-real"
git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia.git
cd optimum-nvidia/third-party/tensorrt-llm
make -C docker release_build CUDA_ARCHS=$TARGET_SM
cd ../.. && docker build -t <organisation_name/image_name>:<version> -f docker/Dockerfile .

Quickstart Guide

Pipelines

Hugging Face pipelines provide a simple yet powerful abstraction to quickly set up inference. If you already have a pipeline from transformers, you can unlock the performance benefits of Optimum-NVIDIA by just changing one line.

- from transformers.pipelines import pipeline
+ from optimum.nvidia.pipelines import pipeline

pipe = pipeline('text-generation', 'meta-llama/Llama-2-7b-chat-hf', use_fp8=True)
pipe("Describe a real-world application of AI in sustainable energy.")

Generate

If you want control over advanced features like quantization and token selection strategies, we recommend using the generate() API. Just like with pipelines, switching from existing transformers code is super simple.

- from transformers import AutoModelForCausalLM
+ from optimum.nvidia import AutoModelForCausalLM
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", padding_side="left")

model = AutoModelForCausalLM.from_pretrained(
  "meta-llama/Llama-2-7b-chat-hf",
+ use_fp8=True,
+ max_prompt_length=1024,
+ max_output_length=2048, # Must be at least size of max_prompt_length + max_new_tokens
+ max_batch_size=8,
)

model_inputs = tokenizer(["How is autonomous vehicle technology transforming the future of transportation and urban planning?"], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs, 
    top_k=40, 
    top_p=0.7, 
    repetition_penalty=10,
)

tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

To learn more about text generation with LLMs, check out this guide!

Support Matrix

We test Optimum-NVIDIA on 4090, L40S, and H100 Tensor Core GPUs, though it is expected to work on any GPU based on the following architectures:

Ampere (A100/A30 are supported. Experimental support for A10, A40, RTX Ax000)
Hopper
Ada-Lovelace

Note that FP8 support is only available on GPUs based on Hopper and Ada-Lovelace architectures.

Optimum-NVIDIA works on Linux will support Windows soon.

Optimum-NVIDIA currently accelerates text-generation with LLaMAForCausalLM, and we are actively working to expand support to include more model architectures and tasks.

Contributing

Check out our Contributing Guide

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0b9 pre-release

Jan 21, 2025

0.1.0b7 pre-release

May 24, 2024

0.1.0b6 pre-release

Apr 11, 2024

0.1.0b5 pre-release

Apr 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum_nvidia-0.1.0b9.tar.gz (41.3 kB view details)

Uploaded Jan 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimum_nvidia-0.1.0b9-py3-none-any.whl (57.2 kB view details)

Uploaded Jan 21, 2025 Python 3

File details

Details for the file optimum_nvidia-0.1.0b9.tar.gz.

File metadata

Download URL: optimum_nvidia-0.1.0b9.tar.gz
Upload date: Jan 21, 2025
Size: 41.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for optimum_nvidia-0.1.0b9.tar.gz
Algorithm	Hash digest
SHA256	`ee9fc53380128cded728de6a3b83d0e4c9484d72b28faccce761e0884441d3e2`
MD5	`b9713241643570061cfd7a46e6edc6ac`
BLAKE2b-256	`555aad4b159c9bd162be15fd7c2285d881584c8dca35c39fa9153b08ca0e7163`

See more details on using hashes here.

File details

Details for the file optimum_nvidia-0.1.0b9-py3-none-any.whl.

File metadata

Download URL: optimum_nvidia-0.1.0b9-py3-none-any.whl
Upload date: Jan 21, 2025
Size: 57.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for optimum_nvidia-0.1.0b9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79a42efc4e3f28ea6e60a012e925f3bd0ec76acd41bb09c256df0c94a6a03e0a`
MD5	`b5099cfc2d9e71ff96f800c2e68738bf`
BLAKE2b-256	`9ad8fcfe2a42e957d4e4cf145abb7283fc2dab8efb2d1c9615c36e281397ebe5`

See more details on using hashes here.

optimum-nvidia 0.1.0b9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Optimum-NVIDIA

Optimized inference with NVIDIA and Hugging Face

Installation

Pip

Docker container

Building from source

Quickstart Guide

Pipelines

Generate

Support Matrix

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes