A fast single image super-resolution (SISR) model for upscaling images with ultra high-quality.

These details have not been verified by PyPI

Project links

Project description

Ultra Zoom

A fast single image super-resolution (SISR) model for upscaling images with ultra high-quality. Ultra Zoom uses a two-stage "zoom in and enhance" mechanism that utilizes a fast deterministic upscaling algorithm to upscale the image and then enhances it through a steerable residual pathway that operates primarily in the low-resolution subspace of a deep neural network.

Key Features

Fast and scalable: Ultra Zoom uses a unique "zoom in and enhance" mechanism that combines the speed of deterministic bicubic interpolation with the power of a deep neural network.
Controllable enhancements: Ultra Zoom integrates channel-wise control modules directly into the architecture, allowing you to finely adjust the amount of denoising, deblurring, and deartifacting to suite your image source.
Full RGB: Unlike many efficient SR models that only operate in the luminance domain, Ultra Zoom operates within the full RGB color domain enhancing both luminance and chrominance for the best possible image quality.

Demo

View at full resolution for best results. More comparisons can be found here.

UltraZoom 2X Comparison UltraZoom 3X Comparison UltraZoom 4X Comparison

Pretrained Models

The following pretrained models are available on HuggingFace Hub.

Name	Zoom	Num Channels	Encoder Layers	Parameters	Library Version
andrewdalpino/UltraZoom-2X-Ctrl	2X	48	20	1.8M	0.2.x
andrewdalpino/UltraZoom-2X	2X	48	20	1.8M	0.1.x
andrewdalpino/UltraZoom-3X	3X	54	30	3.5M	0.1.x
andrewdalpino/UltraZoom-4X	4X	96	40	14M	0.1.x

Pretrained Example

If you'd just like to load the pretrained weights and do inference, getting started is as simple as in the example below. First, you'll need the ultrazoom and torchvision Python packages installed into your project.

pip install ultrazoom torchvision

Next, load the model weights from HuggingFace Hub and feed the network some images. Note that the input to the upscale() method is a normalized (between 0 and 1) 4D tensor with shape [b, 3, w, h] where b is the batch dimension, and w and height are the width and height respectively. The input images should be in RGB format with exactly 3 channels.

import torch

from torchvision.io import decode_image, ImageReadMode
from torchvision.transforms.v2 import ToDtype, ToPILImage

from ultrazoom.model import UltraZoom
from ultrazoom.control import ControlVector


model_name = "andrewdalpino/UltraZoom-V2-2X-Ctrl"
image_path = "./dataset/bird.png"

model = UltraZoom.from_pretrained(model_name)

image_to_tensor = ToDtype(torch.float32, scale=True)
tensor_to_pil = ToPILImage()

image = decode_image(image_path, mode=ImageReadMode.RGB)

x = image_to_tensor(image).unsqueeze(0)

c = ControlVector(
    gaussian_blur=0.5,
    gaussian_noise=0.5,
    jpeg_compression=0.5
).to_tensor()

y_pred = model.upscale(x, c)

pil_image = tensor_to_pil(y_pred.squeeze(0))

pil_image.show()

Clone the Repository

You'll need the code in the repository to train new models and export them for production.

git clone https://github.com/andrewdalpino/UltraZoom

Install Project Dependencies

Project dependencies are specified in the requirements.txt file. You can install them with pip using the following command from the project root. We recommend using a virtual environment such as venv to keep package dependencies on your system tidy.

python -m venv ./.venv

source ./.venv/bin/activate

pip install -r requirements.txt

Pretraining

Ultra Zoom is trained in two stages. The first stage focuses on building a foundation model for fine-tuning. It aims to jointly minimize the Pixel Loss with high and low frequency perceptual losses from the perspective of a pretrained VGG19 image classifier. To start training with the default settings, add your training and testing images to the ./dataset/train and ./dataset/test folders respectively and call the pretraining script like in the example below. If you are looking for good training sets to start with we recommend the DIV2K and/or Flicker2K datasets.

python pretrain.py

You can customize the upscaler model by adjusting the num_channels, hidden_ratio, and num_encoder_layers hyper-parameters like in the example below.

python pretrain.py --num_channels=64 --hidden_ratio=2 --num_encoder_layers=24

You can also adjust the batch_size, learning_rate, and gradient_accumulation_steps to suite your training setup.

python pretrain.py --batch_size=16 --learning_rate=5e-4 --gradient_accumulation_steps=8

In addition, you can control various training data augmentation arguments such as the brightness, contrast, hue, and saturation jitter.

python pretrain.py --brightness_jitter=0.5 --contrast_jitter=0.4 --hue_jitter=0.3 --saturation_jitter=0.2

Training Dashboard

We use TensorBoard to capture and display training events such as loss and gradient norm updates. To launch the dashboard server run the following command from the terminal.

tensorboard --logdir=./runs

Then navigate to the dashboard using your favorite web browser.

Pretraining Arguments

Argument	Default	Type	Description
--train_images_path	"./dataset/train"	str	The path to the folder containing your training images.
--test_images_path	"./dataset/test"	str	The path to the folder containing your testing images.
--num_dataset_processes	8	int	The number of CPU processes to use to preprocess the dataset.
--target_resolution	256	int	The number of pixels in the height and width dimensions of the training images.
--upscale_ratio	2	(1, 2, 3, 4)	The upscaling or zoom factor.
--min_gaussian_blur	0.0	float	The minimum amount of Gaussian blur to apply to the degraded low-resolution image.
--max_gaussian_blur	1.0	float	The maximum amount of Gaussian blur to apply to the degraded low-resolution image.
--min_gaussian_noise	0.0	float	The minimum amount of Gaussian noise to add to the degraded low-resolution image.
--max_gaussian_noise	0.1	float	The maximum amount of Gaussian noise to add to the degraded low-resolution image.
--min_compression	0.0	float	The minimum amount of JPEG compression to apply to the degraded low-resolution image.
--max_compression	0.8	float	The maximum amount of JPEG compression to apply to the degraded low-resolution image.
--brightness_jitter	0.1	float	The amount of jitter applied to the brightness of the training images.
--contrast_jitter	0.1	float	The amount of jitter applied to the contrast of the training images.
--saturation_jitter	0.1	float	The amount of jitter applied to the saturation of the training images.
--hue_jitter	0.1	float	The amount of jitter applied to the hue of the training images.
--batch_size	32	int	The number of training images to pass through the network at a time.
--gradient_accumulation_steps	4	int	The number of batches to pass through the network before updating the model weights.
--num_epochs	100	int	The number of epochs to train for.
--learning_rate	5e-4	float	The learning rate of the AdamW optimizer.
--max_gradient_norm	2.0	float	Clip gradients above this threshold norm before stepping.
--num_channels	48	int	The number of channels within each encoder block.
--hidden_ratio	2	(1, 2, 4)	The ratio of hidden channels to `num_channels` within the activation portion of each encoder block.
--num_encoder_layers	20	int	The number of layers within the body of the encoder.
--activation_checkpointing	False	bool	Should we use activation checkpointing? This will drastically reduce memory utilization during training at the cost of recomputing the forward pass.
--eval_interval	2	int	Evaluate the model after this many epochs on the testing set.
--checkpoint_interval	2	int	Save the model checkpoint to disk every this many epochs.
--checkpoint_path	"./checkpoints/checkpoint.pt"	str	The path to the base checkpoint file on disk.
--resume	False	bool	Should we resume training from the last checkpoint?
--run_dir_path	"./runs"	str	The path to the TensorBoard run directory for this training session.
--device	"cpu"	str	The device to run the computation on.
--seed	None	int	The seed for the random number generator.

Fine-tuning

This next stage focuses on squeezing extra performance out of the model using an adversarial training framework. Step 2 of training takes the pretrained checkpoint and fine-tunes the model using feedback from an adversarial critic model. The critic is specially optimized to detect slight differences between real images and images generated by Ultra Zoom. It uses feedback from the upscaler to improve its detection rate and in turn the upscaler uses feedback from the critic to improve its fool rate. This stage can be considered fully optimized when the critic can no longer reliably detect fake images i.e. the F1 score is pegged near 0.5. To start fine-tuning your pretrained checkpoint see the example below.

python fine-tune.py --base_checkpoint_path=./checkpoints/2X-100.pt

To adjust the size of the critic model use the critic_model_size argument.

python fine-tune.py --base_checkpoint_path="./checkpoints/2X-100.pt" --critic_model_size=medium

Fine-tuning Arguments

Argument	Default	Type	Description
--base_checkpoint_path	None	str	The path to the pretrained checkpoint.
--train_images_path	"./dataset/train"	str	The path to the folder containing your training images.
--test_images_path	"./dataset/test"	str	The path to the folder containing your testing images.
--num_dataset_processes	8	int	The number of CPU processes to use to preprocess the dataset.
--target_resolution	512	int	The number of pixels in the height and width dimensions of the training images.
--min_gaussian_blur	0.0	float	The minimum amount of Gaussian blur to apply to the degraded low-resolution image.
--max_gaussian_blur	1.0	float	The maximum amount of Gaussian blur to apply to the degraded low-resolution image.
--min_gaussian_noise	0.0	float	The minimum amount of Gaussian noise to add to the degraded low-resolution image.
--max_gaussian_noise	0.1	float	The maximum amount of Gaussian noise to add to the degraded low-resolution image.
--min_compression	0.0	float	The minimum amount of JPEG compression to apply to the degraded low-resolution image.
--max_compression	0.8	float	The maximum amount of JPEG compression to apply to the degraded low-resolution image.
--brightness_jitter	0.1	float	The amount of jitter applied to the brightness of the training images.
--contrast_jitter	0.1	float	The amount of jitter applied to the contrast of the training images.
--saturation_jitter	0.1	float	The amount of jitter applied to the saturation of the training images.
--hue_jitter	0.1	float	The amount of jitter applied to the hue of the training images.
--batch_size	8	int	The number of training images to pass through the network at a time.
--gradient_accumulation_steps	16	int	The number of batches to pass through the network before updating the model weights.
--upscaler_learning_rate	1e-4	float	The learning rate of the AdamW optimizer.
--upscaler_max_gradient_norm	1.0	float	Clip gradients above this threshold norm before stepping.
--critic_learning_rate	5e-4	float	The learning rate of the AdamW optimizer.
--critic_max_gradient_norm	5.0	float	Clip gradients above this threshold norm before stepping.
--num_epochs	100	int	The number of epochs to train for.
--critic_warmup_epochs	1	int	Train the critic model for this many epochs before using it to train the upscaler.
--critic_model_size	"small"	str	The size of the critic model. Choice of small, medium, and large.
--activation_checkpointing	False	bool	Should we use activation checkpointing? This will drastically reduce memory utilization during training at the cost of recomputing the forward pass.
--eval_interval	2	int	Evaluate the model after this many epochs on the testing set.
--checkpoint_interval	2	int	Save the model checkpoint to disk every this many epochs.
--checkpoint_path	"./checkpoints/checkpoint.pt"	str	The path to the base checkpoint file on disk.
--resume	False	bool	Should we resume training from the last checkpoint?
--run_dir_path	"./runs"	str	The path to the TensorBoard run directory for this training session.
--device	"cpu"	str	The device to run the computation on.
--seed	None	int	The seed for the random number generator.

Test Comparison

You can use the provided test-compare.py script to generate upscaled images from the trained model at the default checkpoint like in the example below.

python test-compare.py --image_path="./example.jpg"

To generate images using a different checkpoint you can use the checkpoint_path argument like in the example below.

python test-compare.py --checkpoint_path="./checkpoints/fine-tuned.pt" --image_path="./example.jpg"

You can adjust the level of enhancements applied to the image by setting the gaussian_blur, gaussian_noise, and jpeg_compression arguments like in the example below. Each value has been normalized such that 0 means no enhancement and 1 means full enhancement.

python test-compare.py --guassian_blur=0.5 --gaussian_noise=0.7 --jpeg_compression=0.9 --image_path="./example.jpg"

Test compare Arguments

Argument	Default	Type	Description
--image_path	None	str	The path to the image file to be upscaled by the model.
--checkpoint_path	"./checkpoints/fine-tuned.pt"	str	The path to the base checkpoint file on disk.
--gaussian_blur	0.5	float	The strength of gaussian blur removal from the image, between 0 and 1.
--gaussian_noise	0.5	float	The strength of gaussian noise removal from the image, between 0 and 1.
--jpeg_compression	0.5	float	The strength of JPEG compression artifact removal from the image, between 0 and 1.
--device	"cpu"	str	The device to run the computation on.

References

S. Park, et. al. NeXtSRGAN: enhancing super-resolution GAN with ConvNeXt discriminator for superior realism, The Visual Computer, 2025.

J. Song, et. al Gram-GAN: Image Super-Resolution Based on Gram Matrix and Discriminator Perceptual Loss, Sensors, 2023.

Z. Liu, et al. A ConvNet for the 2020s, 2022.

A. Jolicoeur-Martineau. The Relativistic Discriminator: A Key Element Missing From Standard GAN, 2018.

J. Yu, et al. Wide Activation for Efficient and Accurate Image Super-Resolution, 2018.

J. Johnson, et al. Perceptual Losses for Real-time Style Transfer and Super-Resolution, 2016.

W. Shi, et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, 2016.

T. Salimans, et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, OpenAI, 2016.

T. Miyato, et al. Spectral Normalization for Generative Adversarial Networks, ICLR, 2018.

E. Perez, et. al. FiLM: Visual Reasoning with a General Conditioning Layer, Association for the Advancement of Artificial Intelligence, 2018.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Nov 14, 2025

0.2.0

Nov 14, 2025

0.1.2

Oct 17, 2025

0.1.1

Oct 17, 2025

0.1.0

Aug 12, 2025

0.0.1

Jul 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrazoom-0.2.1.tar.gz (16.7 kB view details)

Uploaded Nov 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultrazoom-0.2.1-py3-none-any.whl (10.6 kB view details)

Uploaded Nov 14, 2025 Python 3

File details

Details for the file ultrazoom-0.2.1.tar.gz.

File metadata

Download URL: ultrazoom-0.2.1.tar.gz
Upload date: Nov 14, 2025
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ultrazoom-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`69cf79818bdaeef628c65f31c1fbe4a07b8c6e20f377a83756fb132771d803a8`
MD5	`21ff0dfef26ddad8bcdf5922e70a3ad5`
BLAKE2b-256	`4ba70ed077d5c086572cef71ecc215443e3f22b0978f7067c8bb5c0abfb876c6`

See more details on using hashes here.

File details

Details for the file ultrazoom-0.2.1-py3-none-any.whl.

File metadata

Download URL: ultrazoom-0.2.1-py3-none-any.whl
Upload date: Nov 14, 2025
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ultrazoom-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3b8da2ee4828c1d5093185a5f59a7241fba605c488bddc9aa83c78dbf08a9f5`
MD5	`b649308ea8ef1a81a7beaea8691efd28`
BLAKE2b-256	`3620dc4b4a9cc5629db616d92c2db5815f596c81606507dc71022e228aa3ff31`

See more details on using hashes here.

ultrazoom 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Ultra Zoom

Key Features

Demo

Pretrained Models

Pretrained Example

Clone the Repository

Install Project Dependencies

Pretraining

Training Dashboard

Pretraining Arguments

Fine-tuning

Fine-tuning Arguments

Test Comparison

Test compare Arguments

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes