Skip to main content

The Hourglass Diffusion Transformer (HDiT) is an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution directly in pixel-space.

Project description

Hourglass Diffusion Transformers (HDiT)

License: MIT Python Version PyTorch Version

Overview

This repository provides a ​​non-official​​ implementation of the ​​Hourglass Diffusion Transformer (HDiT)​​ model, as proposed in the paper "Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers" by Crowson et al. (ICML 2024).

The original work introduces a diffusion-based transformer architecture capable of generating high-resolution images directly in pixel space, with computational cost scaling linearly with respect to resolution. This package extracts the main HDiT part from the original repository (k-diffusion).

Key Features:

  • Scalable: Supports high-resolution image synthesis.
  • Efficient: Achieves significantly lower computational cost compared to the traditional Diffusion Transformer (DiT).

Installation

Install the package using pip:

pip install hdit

Usage

Example Code:

Initialize the model backbone and use it in the diffusion model.

from hdit import HDiT

model = HDiT(
   in_channels=3,
   out_channels=3,
   patch_size=[4, 4],
   widths=[128, 256],
   middle_width=512,
   depths=[2, 2],
   middle_depth=4,
   mapping_width=256,
   mapping_depth=2
)

Citation

Please cite the original paper. For more details, visit the official​​ arXiv page.

@InProceedings{crowson2024hourglass,
    title = {Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers},
    author = {Crowson, Katherine and Baumann, Stefan Andreas and Birch, Alex and Abraham, Tanishq Mathew and Kaplan, Daniel Z and Shippole, Enrico},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages = {9550--9575},
    year = {2024},
    editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume = {235},
    series = {Proceedings of Machine Learning Research},
    month = {21--27 Jul},
    publisher = {PMLR},
    pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/crowson24a/crowson24a.pdf},
    url = {https://proceedings.mlr.press/v235/crowson24a.html},
    abstract = {We present the Hourglass Diffusion Transformer (HDiT), an image-generative model that exhibits linear scaling with pixel count, supporting training at high resolution (e.g. $1024 \times 1024$) directly in pixel-space. Building on the Transformer architecture, which is known to scale to billions of parameters, it bridges the gap between the efficiency of convolutional U-Nets and the scalability of Transformers. HDiT trains successfully without typical high-resolution training techniques such as multiscale architectures, latent autoencoders or self-conditioning. We demonstrate that HDiT performs competitively with existing models on ImageNet $256^2$, and sets a new state-of-the-art for diffusion models on FFHQ-$1024^2$. Code is available at https://github.com/crowsonkb/k-diffusion.}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdit-0.0.1b6.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdit-0.0.1b6-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file hdit-0.0.1b6.tar.gz.

File metadata

  • Download URL: hdit-0.0.1b6.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for hdit-0.0.1b6.tar.gz
Algorithm Hash digest
SHA256 35f723cf0955d5217fe16f8e2ad703c7dcd5ec6da7bd7b1a0f661849b58bd1b4
MD5 f4edbbc0ed2e754f181063a907e7271d
BLAKE2b-256 ccca6e00358e0af086661a1a816cb4a73f3202d87ffb64388e3c1ff2530168ec

See more details on using hashes here.

File details

Details for the file hdit-0.0.1b6-py3-none-any.whl.

File metadata

  • Download URL: hdit-0.0.1b6-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for hdit-0.0.1b6-py3-none-any.whl
Algorithm Hash digest
SHA256 de8efd85286b5b50199dbb9306bd2de5098bf29e186e4290d164ab40b70be524
MD5 af155b9daa7b1c0c0256aeee86f898cb
BLAKE2b-256 d1c698eb867319a46c310db5be37c842709669f596cc0a0fd0de3b18d9f66924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page