Skip to main content

Package for applying ao techniques to GPU models

Project description

torchao: PyTorch Architecture Optimization

Note: This repository is currently under heavy development - if you have suggestions on the API or use-cases you'd like to be covered, please open an GitHub issue

Introduction

torchao is a PyTorch native library for optimizing your models using lower precision dtypes, techniques like quantization and sparsity and performant kernels.

Get Started

To try out our APIs, you can check out API examples in quantization (including autoquant), sparsity, dtypes.

Installation

Note: this library makes liberal use of several new features in pytorch, it's recommended to use it with the current nightly or latest stable version of PyTorch.

  1. From PyPI:
pip install torchao
  1. From Source:
git clone https://github.com/pytorch-labs/ao
cd ao
pip install -e .

Key Features

The library provides

  1. Support for lower precision dtypes such as nf4, uint4 that are torch.compile friendly
  2. Quantization algorithms such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.
  • Int8 dynamic activation quantization
  • Int8 and int4 weight-only quantization
  • Int8 dynamic activation quantization with int4 weight quantization
  • GPTQ and Smoothquant
  • High level autoquant API and kernel auto tuner targeting SOTA performance across varying model shapes on consumer/enterprise GPUs.
  1. Sparsity algorithms such as Wanda that help improve accuracy of sparse networks
  2. Integration with other PyTorch native libraries like torchtune and ExecuTorch

Our Goals

torchao embodies PyTorch’s design philosophy details, especially "usability over everything else". Our vision for this repository is the following:

  • Composability: Native solutions for optimization techniques that compose with both torch.compile and FSDP
    • For example, for QLoRA for new dtypes support
  • Interoperability: Work with the rest of the PyTorch ecosystem such as torchtune, gpt-fast and ExecuTorch
  • Transparent Benchmarks: Regularly run performance benchmarking of our APIs across a suite of Torchbench models and across hardware backends
  • Heterogeneous Hardware: Efficient kernels that can run on CPU/GPU based server (w/ torch.compile) and mobile backends (w/ ExecuTorch).
  • Infrastructure Support: Release packaging solution for kernels and a CI/CD setup that runs these kernels on different backends.

Interoperability with PyTorch Libraries

torchao has been integrated with other repositories to ease usage

  • torchtune is integrated with 8 and 4 bit weight-only quantization techniques with and without GPTQ.
  • Executorch is integrated with GPTQ for both 8da4w (int8 dynamic activation, with int4 weight) and int4 weight only quantization.

Success stories

Our kernels have been used to achieve SOTA inference performance on

  1. Image segmentation models with sam-fast
  2. Language models with gpt-fast
  3. Diffusion models with sd-fast

License

torchao is released under the BSD 3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchao_nightly-2024.4.26.tar.gz (98.8 kB view hashes)

Uploaded Source

Built Distribution

torchao_nightly-2024.4.26-py3-none-any.whl (120.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page