Skip to main content

A fast single image super-resolution (SISR) model for upscaling images without loss of detail.

Project description

Ultra Zoom

A fast single image super-resolution (SISR) model for upscaling images without loss of detail. Ultra Zoom uses a two-stage "zoom in and enhance" strategy that first applies a deterministic upscaling algorithm to the image and then uses a deep neural network to fill in the details. As such, Ultra Zoom requires less resources than upscalers that necessarily predict every new pixel de novo - making it outstanding for real-time image processing.

Key Features

  • Fast and scalable: Instead of directly predicting the individual pixels of the upscaled image, Ultra Zoom uses a fast deterministic upscaling algorithm and then enhances the image through a residual pathway that operates primarily within the low-resolution subspace of a deep neural network.

  • Next-gen architecture: Ultra Zoom employs a next-generation convolutional neural network architecture that performs better than previous generations by employing spatial attention, wide non-linear activations, and sub-pixel convolution.

Pretrained Models

The following pretrained models are available on HuggingFace Hub.

Name Zoom Num Channels Hidden Ratio Encoder Layers Total Parameters
andrewdalpino/UltraZoom-2X 2X 48 2X 20 1.8M
andrewdalpino/UltraZoom-4X 4X 96 2X 28 10M
andrewdalpino/UltraZoom-8X 8X 192 2X 36 54M

Clone the Repository

You'll need the code in the repository to load the pretrained weights or to train new models.

git clone https://github.com/andrewdalpino/UltraZoom

Install Project Dependencies

Project dependencies are specified in the requirements.txt file. You can install them with pip using the following command from the project root. We recommend using a virtual environment such as venv to keep package dependencies on your system tidy.

python -m venv ./.venv

source ./.venv/bin/activate

pip install -r requirements.txt

Training

To start training with the default settings, add your training and testing images to the ./dataset/train and ./dataset/test folders respectively and call the pretraining script like in the example below. If you are looking for good training sets to start with we recommend the DIV2K and/or Flicker2K datasets.

python train.py

You can customize the upscaler model by adjusting the num_channels, hidden_ratio, and num_encoder_layers hyper-parameters like in the example below.

python train.py --num_channels=64 --hidden_ratio=2 --num_encoder_layers=24

You can also adjust the batch_size, learning_rate, and gradient_accumulation_steps to suite your training setup.

python train.py --batch_size=16 --learning_rate=5e-4 --gradient_accumulation_steps=8

In addition, you can control various training data augmentation arguments such as the brightness, contrast, hue, and saturation jitter.

python train.py --brightness_jitter=0.5 --contrast_jitter=0.4 --hue_jitter=0.3 --saturation_jitter=0.2

Training Dashboard

We use TensorBoard to capture and display training events such as loss and gradient norm updates. To launch the dashboard server run the following command from the terminal.

tensorboard --logdir=./runs

Then navigate to the dashboard using your favorite web browser.

Training Arguments

Argument Default Type Description
--train_images_path "./dataset/train" str The path to the folder containing your training images.
--test_images_path "./dataset/test" str The path to the folder containing your testing images.
--num_dataset_processes 4 int The number of CPU processes to use to preprocess the dataset.
--target_resolution 256 int The number of pixels in the height and width dimensions of the training images.
--upscale_ratio 2 (2, 4, 8) The upscaling or zoom factor.
--blur_amount 0.5 float The amount of Gaussian blur to apply to the degraded low-resolution image.
--compression_amount 0.2 float The amount of JPEG compression to apply to the degraded low-resolution image.
--noise_amount 0.02 float The amount of Gaussian noise to add to the degraded low-resolution image.
--brightness_jitter 0.1 float The amount of jitter applied to the brightness of the training images.
--contrast_jitter 0.1 float The amount of jitter applied to the contrast of the training images.
--saturation_jitter 0.1 float The amount of jitter applied to the saturation of the training images.
--hue_jitter 0.1 float The amount of jitter applied to the hue of the training images.
--batch_size 32 int The number of training images to pass through the network at a time.
--gradient_accumulation_steps 4 int The number of batches to pass through the network before updating the model weights.
--num_epochs 50 int The number of epochs to train for.
--learning_rate 5e-4 float The learning rate of the Adafactor optimizer.
--max_gradient_norm 2.0 float Clip gradients above this threshold norm before stepping.
--num_channels 48 int The number of channels within each encoder block.
--hidden_ratio 2 (1, 2, 4) The ratio of hidden channels to num_channels within the activation portion of each encoder block.
--num_encoder_layers 20 int The number of layers within the body of the encoder.
--activation_checkpointing False bool Should we use activation checkpointing? This will drastically reduce memory utilization during training at the cost of recomputing the forward pass.
--eval_interval 2 int Evaluate the model after this many epochs on the testing set.
--checkpoint_interval 2 int Save the model checkpoint to disk every this many epochs.
--checkpoint_path "./checkpoints/checkpoint.pt" str The path to the base checkpoint file on disk.
--resume False bool Should we resume training from the last checkpoint?
--run_dir_path "./runs" str The path to the TensorBoard run directory for this training session.
--device "cuda" str The device to run the computation on.
--seed None int The seed for the random number generator.

Upscaling

You can use the provided upscale.py script to generate upscaled images from the trained model at the default checkpoint like in the example below. In addition, you can create your own inferencing pipeline using the same model under the hood that leverages batch processing for large scale production systems.

python upscale.py --image_path="./example.jpg"

To generate images using a different checkpoint you can use the checkpoint_path argument like in the example below.

python upscale.py --checkpoint_path="./checkpoints/fine-tuned.pt" --image_path="./example.jpg"

Upscaling Arguments

Argument Default Type Description
--image_path None str The path to the image file to be upscaled by the model.
--checkpoint_path "./checkpoints/fine-tuned.pt" str The path to the base checkpoint file on disk.
--device "cuda" str The device to run the computation on.

References

  • Z. Liu, et al. A ConvNet for the 2020s, 2022.
  • J. Yu, et al. Wide Activation for Efficient and Accurate Image Super-Resolution, 2018.
  • J. Johnson, et al. Perceptual Losses for Real_time Style Transfer and Super-Resolution, 2016.
  • W. Shi, et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, 2016.
  • T. Salimans, et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, OpenAI, 2016.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrazoom-0.0.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultrazoom-0.0.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file ultrazoom-0.0.1.tar.gz.

File metadata

  • Download URL: ultrazoom-0.0.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ultrazoom-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b5e45ce492d3e0839944501a790499a6a953a70342b19f623c4e820220fc2199
MD5 e3c015d03d183931de0777695322b6bd
BLAKE2b-256 05fbeb8b1324a93dd6d04246338baa835d5c0557f73fabec00c12503b732e493

See more details on using hashes here.

File details

Details for the file ultrazoom-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ultrazoom-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ultrazoom-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cc2ff9a2542136b693d9c4ca33981cc3867b8ef4d1fa783b08978601187e263
MD5 3d741b45c803e841a690c1e0b2d89c73
BLAKE2b-256 4288b7de2f3d2241c81942904dcd91a542215ad124302e1ff2adf2cbeff71307

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page