Smart image downsampling for image classification datasets
Project description
smartdownsample
Efficient downsampling for image classification datasets
SmartDownsample selects the most diverse images from large collections, ideal for reducing dataset size while preserving visual variability.
Installation
pip install smartdownsample
Usage
from smartdownsample import select_distinct
# Example list of image paths
my_image_list = [
"path/to/img1.jpg",
"path/to/img2.jpg",
"path/to/img3.jpg",
"path/to/img4.jpg"
]
# Simple selection - get 100 most diverse images
selected = select_distinct(
image_paths=my_image_list,
target_count=100
)
# With visual verification to see excluded images in context
selected = select_distinct(
image_paths=my_image_list,
target_count=100,
show_verification=True
)
print(f"Selected {len(selected)} images")
Parameters
| Parameter | Default | Description |
|---|---|---|
image_paths |
Required | List of image file paths (str or Path objects) |
target_count |
Required | Exact number of images to select |
window_size |
100 |
Rolling window size (larger = better quality, slower) |
random_seed |
42 |
Random seed for reproducible results |
show_progress |
True |
Whether to display progress bars |
show_verification |
False |
Show visual verification comparing excluded vs included images |
Step by Step
- Sort paths by directory. Within each folder, files are naturally ordered (e.g.,
img1.jpg,img2.jpg,img10.jpg) so related images remain grouped. - Compute perceptual hashes for all valid image paths.
- Apply rolling window selection on the hash array to choose indices of the most diverse images. This runs in O(n) time, scales to large classes of 100k+ images, and compares each candidate only to a sliding window of recent selections.
- Return results as
[valid_paths[i] for i in selected_indices]. - Optional verification plot: If
show_verification=True, the algorithm displays a visual check of 18 randomly selected excluded images and their included counterpart. The visualization opens automatically in your default image viewer without saving files to disk.
License
MIT License – see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
smartdownsample-0.1.1.tar.gz
(10.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartdownsample-0.1.1.tar.gz.
File metadata
- Download URL: smartdownsample-0.1.1.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c61d1e2a34ff9d3888820f53525924368895a9df0772840a4f2f295584c94d4
|
|
| MD5 |
5d0b06e29986790d4ebdf1db00080869
|
|
| BLAKE2b-256 |
73b218f7e968ccdb22a3959d6c813fcf2d8473bb12b49d64a4df7dfe245d2f7f
|
File details
Details for the file smartdownsample-0.1.1-py3-none-any.whl.
File metadata
- Download URL: smartdownsample-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
551efcfca43812abd2e1b190bc087032c4f6f34271b82c0fe9a594db25a40377
|
|
| MD5 |
0205816d29e60939e558ef060aecc655
|
|
| BLAKE2b-256 |
5046b606332be76ec3708bda0f28d77831a36d1711b5c48bcff9ebdad1a512ef
|