Fast and reliable HR-LR image pair generator for ML/DL datasets
Project description
pairgen ✂️
pairgen is a fast, reliable, and dependency-light CLI tool designed to generate High-Resolution (HR) and Low-Resolution (LR) image pairs for Super-Resolution Machine Learning and Deep Learning datasets.
Whether you are preparing standard academic benchmarks (like Set5, Set14, DIV2K) or generating complex Real-World SR validation sets, pairgen handles exact interpolation, patch extraction, and advanced degradations out of the box.
✨ Features
- 🔬 Academic Reproducibility: Includes a pure NumPy implementation of MATLAB's
imresize(bicubic interpolation with antialiasing), which is strictly required for evaluating standard SR models. - 🌪️ BSRGAN-like Degradation Pipeline: Simulates real-world image degradation by applying random Gaussian/Sinc blur, Gaussian/Poisson noise, and JPEG compression in a randomized order (Random Shuffle Strategy).
- ✂️ Smart Patch Extraction: Easily extract multiple random crops from large images to maximize your dataset's utility.
- ⚡ Multiprocessing: Built with
ProcessPoolExecutorto utilize all CPU cores, effortlessly processing thousands of high-resolution images. - 🪶 Lightweight: No heavy dependencies like OpenCV or SciPy. All complex FFT math and kernels are implemented using
numpyandPillow.
🎯 Motivation
While working on Image Super-Resolution projects, preparing testing and validation datasets is always a bottleneck. Academic benchmarks require exact MATLAB-like downsampling, while Real-World SR evaluation requires complex degradation pipelines.
Applying degradations on-the-fly during training is a standard practice, but for Validation and Test sets, you need strictly fixed, pre-generated LR images to reliably compare models and calculate PSNR/SSIM metrics. pairgen was created to standardize this offline generation process into one CLI command, perfectly complementing tools like manigen.
📦 Installation
You can install pairgen directly from PyPI using pip:
pip install pairgen
Or, if you use uv (recommended for CLI tools):
uv tool install pairgen
🚀 Quick Start
Generate a standard benchmark dataset with MATLAB bicubic downsampling (x4 scale):
pairgen -i data/Set14 -o data/Set14_pairs -s 4
💡 Advanced Usage Examples
1. Generating Real-World SR Validation Sets
Create a complex, degraded test set by enabling the BSRGAN-like pipeline (blur, noise, and JPEG compression applied in random order):
pairgen -i data/validation -o data/validation_degraded -s 4 --blur --noise --jpeg
2. Extracting Multiple Patches
If you have 2K/4K images and want to extract 50 random 256x256 patches from each image to build a rich dataset:
pairgen -i data/DIV2K_train -o data/DIV2K_patches -s 4 -p 256 -np 50
3. Combining with manigen
pairgen natively supports reading file manifests (.txt files containing lists of paths). You can use manigen to index your dataset and split it, then pass the manifest to pairgen:
pairgen -i train_manifest.txt -o data/train_set -s 4
🛠️ CLI Reference
| Argument | Short | Description | Default |
|---|---|---|---|
--input-path |
-i |
(Required) Input directory or manifest file to scan. | - |
--output-dir |
-o |
(Required) Output directory where HR and LR folders will be created. | - |
--scaling-factor |
-s |
(Required) Scaling factor for LR images (e.g., 2, 4). | - |
--recursive |
-r |
Scan subdirectories recursively. | False |
--workers |
-w |
Number of CPU cores to use. Use 1 for strict sequential order. | 1 |
--interpolation |
-im |
Interpolation: matlab_bicubic, bilinear, bicubic, lanczos, nearest. |
matlab_bicubic |
--patch-size |
-p |
If > 0, extracts square patches of this size from HR. | 0 |
--num-patches |
-np |
Number of random patches to extract per image. | 1 |
--augment |
Apply random flips and rotations to HR. | False |
|
--blur |
Apply random Gaussian/Sinc blur to LR. | False |
|
--noise |
Apply random Gaussian/Poisson noise to LR. | False |
|
--jpeg |
Apply random JPEG compression to LR. | False |
🤝 Contributing
1. Clone the repository
git clone https://github.com/ash1ra/pairgen
cd pairgen
2. Install dependencies using uv
uv sync
# On Windows
.venv\Scripts\activate
# on Unix or MacOS
source .venv/bin/activate
3. Format and lint the code
uv run ruff format .
uv run ruff check .
4. Run the tests
uv run pytest tests/ -v
5. Submit a pull request
If you'd like to contribute, please fork the repository and open a pull request to the main branch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pairgen-1.0.0.tar.gz.
File metadata
- Download URL: pairgen-1.0.0.tar.gz
- Upload date:
- Size: 5.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73dcd594a1bef0daf2f32f8f9b89e1804cdde2041453cfd147daf2571f8e789b
|
|
| MD5 |
b21bcbcc63a0964786bbecf534be99a4
|
|
| BLAKE2b-256 |
9a35cf9794c150db0303c23099eb39bab7d55a42321cfe7f6acf53a15f4d62e6
|
File details
Details for the file pairgen-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pairgen-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95a166e93076c769e3cae28134d9c5a885237379b69a837ce38f3997b4da2f97
|
|
| MD5 |
b6e793f0d5041ba37d79746e6edda0a6
|
|
| BLAKE2b-256 |
23e30aa7821b54131e9ca8efa0c907e4339d90b22a5a073f4b8d8b9ff8dac66e
|