Skip to main content

Fast and reliable HR-LR image pair generator for ML/DL datasets

Project description

pairgen ✂️

PyPI Version Python Versions License CI Status

pairgen is a fast, reliable, and dependency-light CLI tool designed to generate High-Resolution (HR) and Low-Resolution (LR) image pairs for Super-Resolution Machine Learning and Deep Learning datasets.

Whether you are preparing standard academic benchmarks (like Set5, Set14, DIV2K) or generating complex Real-World SR validation sets, pairgen handles exact interpolation, patch extraction, and advanced degradations out of the box.

✨ Features

  • 🔬 Academic Reproducibility: Includes a pure NumPy implementation of MATLAB's imresize (bicubic interpolation with antialiasing), which is strictly required for evaluating standard SR models.
  • 🌪️ BSRGAN-like Degradation Pipeline: Simulates real-world image degradation by applying random Gaussian/Sinc blur, Gaussian/Poisson noise, and JPEG compression in a randomized order (Random Shuffle Strategy).
  • ✂️ Smart Patch Extraction: Easily extract multiple random crops from large images to maximize your dataset's utility.
  • Multiprocessing: Built with ProcessPoolExecutor to utilize all CPU cores, effortlessly processing thousands of high-resolution images.
  • 🪶 Lightweight: No heavy dependencies like OpenCV or SciPy. All complex FFT math and kernels are implemented using numpy and Pillow.

🎯 Motivation

While working on Image Super-Resolution projects, preparing testing and validation datasets is always a bottleneck. Academic benchmarks require exact MATLAB-like downsampling, while Real-World SR evaluation requires complex degradation pipelines.

Applying degradations on-the-fly during training is a standard practice, but for Validation and Test sets, you need strictly fixed, pre-generated LR images to reliably compare models and calculate PSNR/SSIM metrics. pairgen was created to standardize this offline generation process into one CLI command, perfectly complementing tools like manigen.

📦 Installation

You can install pairgen directly from PyPI using pip:

pip install pairgen

Or, if you use uv (recommended for CLI tools):

uv tool install pairgen

🚀 Quick Start

Generate a standard benchmark dataset with MATLAB bicubic downsampling (x4 scale):

pairgen -i data/Set14 -o data/Set14_pairs -s 4

💡 Advanced Usage Examples

1. Generating Real-World SR Validation Sets

Create a complex, degraded test set by enabling the BSRGAN-like pipeline (blur, noise, and JPEG compression applied in random order):

pairgen -i data/validation -o data/validation_degraded -s 4 --blur --noise --jpeg

2. Extracting Multiple Patches

If you have 2K/4K images and want to extract 50 random 256x256 patches from each image to build a rich dataset:

pairgen -i data/DIV2K_train -o data/DIV2K_patches -s 4 -p 256 -np 50

3. Combining with manigen

pairgen natively supports reading file manifests (.txt files containing lists of paths). You can use manigen to index your dataset and split it, then pass the manifest to pairgen:

pairgen -i train_manifest.txt -o data/train_set -s 4

🛠️ CLI Reference

Argument Short Description Default
--input-path -i (Required) Input directory or manifest file to scan. -
--output-dir -o (Required) Output directory where HR and LR folders will be created. -
--scaling-factor -s (Required) Scaling factor for LR images (e.g., 2, 4). -
--recursive -r Scan subdirectories recursively. False
--workers -w Number of CPU cores to use. Use 1 for strict sequential order. 1
--interpolation -im Interpolation: matlab_bicubic, bilinear, bicubic, lanczos, nearest. matlab_bicubic
--patch-size -p If > 0, extracts square patches of this size from HR. 0
--num-patches -np Number of random patches to extract per image. 1
--augment Apply random flips and rotations to HR. False
--blur Apply random Gaussian/Sinc blur to LR. False
--noise Apply random Gaussian/Poisson noise to LR. False
--jpeg Apply random JPEG compression to LR. False

🤝 Contributing

1. Clone the repository

git clone https://github.com/ash1ra/pairgen
cd pairgen

2. Install dependencies using uv

uv sync
# On Windows
.venv\Scripts\activate
# on Unix or MacOS
source .venv/bin/activate

3. Format and lint the code

uv run ruff format .
uv run ruff check .

4. Run the tests

uv run pytest tests/ -v

5. Submit a pull request

If you'd like to contribute, please fork the repository and open a pull request to the main branch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairgen-1.0.1.tar.gz (33.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pairgen-1.0.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file pairgen-1.0.1.tar.gz.

File metadata

  • Download URL: pairgen-1.0.1.tar.gz
  • Upload date:
  • Size: 33.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pairgen-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e56d8c396e987e9e6428f2a3b67a6296d1ad0d80eed79b0838680463a51bab83
MD5 f10a5a56fc361d7127ca968eee832ef3
BLAKE2b-256 afea2ec64131ac0998121eadd592745ce108a1261313aec33ced564c82291ef9

See more details on using hashes here.

File details

Details for the file pairgen-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pairgen-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pairgen-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 daf4ec943fab0e72b2ca3226c86c6b268bee1de1bb2ed3374f9ad645ba2dc3cc
MD5 36a93dabf7d9ef113f5598c18da81041
BLAKE2b-256 16e33f41115e79eb93873b9355a3cf9cc440529457523fd9690e39017011a2c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page