Extract ImageNet image paths by category keywords
Project description
ParseImageNet
Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.
Kaggle Competition Dataset
Prerequisites
- Python 3.8+
- ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
ImageNet-Subset/ ├── LOC_synset_mapping.txt ├── LOC_val_solution.csv └── ILSVRC/ ├── ImageSets/ │ └── CLS-LOC/ │ ├── train_cls.txt │ └── val.txt └── Data/ └── CLS-LOC/ ├── train/ │ ├── n01440764/ │ │ ├── n01440764_10026.JPEG │ │ └── ... │ └── ... └── val/ ├── ILSVRC2012_val_00000001.JPEG └── ...
Installation
pip install parseimagenet
For local development:
git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet
Usage
[!NOTE]
Params
| Parameter | Type | Default | Alternatives | Description |
|---|---|---|---|---|
base_path |
Path |
- | Any valid directory path | Root path to the ImageNet dataset |
preset |
str or None |
None |
"birds", "dogs", ... via get_available_presets() |
Predefined keyword list. None selects all categories |
keywords |
list or None |
None |
Any list of strings | Custom keyword list. Overrides preset when provided |
num_images |
int |
200 |
Any positive integer | Max images to return (capped by availability) |
source |
str |
"train" |
"val" |
Data split to sample from |
silent |
bool |
True |
False |
Suppresses print output when enabled |
Base Example
from pathlib import Path
from parseimagenet import get_image_paths_by_keywords
# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset
# Default: no preset, selects from all categories
image_paths = get_image_paths_by_keywords(base_path=base_path)
# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])
Using Presets
[!NOTE]
Presets are predefined keyword lists for common categories:
from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers
# See available presets
print(get_available_presets()) # ['birds', 'dogs', 'wild_canids', 'snakes']
# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])
# Use a specific preset
image_paths = get_image_paths_by_keywords(
base_path=base_path,
preset="birds",
num_images=200
)
Using Keywords
[!NOTE]
Custom keywords override the preset:
[!IMPORTANT]
you can find all applicable category keywords in the
LOC_synset_mapping.txtfile
image_paths = get_image_paths_by_keywords(
base_path=base_path,
keywords=['dog', 'puppy', 'hound'],
num_images=100
)
Using Sources
By default, images are sourced from the training set. Use source="val" to pull from the validation set instead:
[!IMPORTANT]
we do not provide a fetch from the test data because the Kaggle Competition Dataset does not provide the ground truth for the training data.
image_paths = get_image_paths_by_keywords(
base_path=base_path,
preset="birds",
num_images=100,
source="val"
)
Command Line
# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset
# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100
# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords "dog, puppy" --num_images 100
# Use validation data instead of training data
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --source val --num_images 100
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parseimagenet-1.5.0.tar.gz.
File metadata
- Download URL: parseimagenet-1.5.0.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
579109edaed794fa0ae192b691e9e109787878ef46a560437e242093ffc64953
|
|
| MD5 |
aeacc569e3161cb65d2e5fa604530c35
|
|
| BLAKE2b-256 |
599472a9f08cd9dd40608f98fe07859177c846c2856464546914fedd6b179e58
|
Provenance
The following attestation bundles were made for parseimagenet-1.5.0.tar.gz:
Publisher:
publish.yml on MrT3313/Parse-ImageNet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseimagenet-1.5.0.tar.gz -
Subject digest:
579109edaed794fa0ae192b691e9e109787878ef46a560437e242093ffc64953 - Sigstore transparency entry: 955950099
- Sigstore integration time:
-
Permalink:
MrT3313/Parse-ImageNet@61f0debac2c2f4c3d7c534ee335617edf2653d4c -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/MrT3313
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@61f0debac2c2f4c3d7c534ee335617edf2653d4c -
Trigger Event:
push
-
Statement type:
File details
Details for the file parseimagenet-1.5.0-py3-none-any.whl.
File metadata
- Download URL: parseimagenet-1.5.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0147a2dda6da15968b9e9fb4019675d959c7f317a4e154cec3abfcd435c077e
|
|
| MD5 |
41fee355fdfc579871ce992b30041bc2
|
|
| BLAKE2b-256 |
39aecb91b6933309f680bba878785a05a94b91cc3204b67a7b91fb5d6e518025
|
Provenance
The following attestation bundles were made for parseimagenet-1.5.0-py3-none-any.whl:
Publisher:
publish.yml on MrT3313/Parse-ImageNet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseimagenet-1.5.0-py3-none-any.whl -
Subject digest:
b0147a2dda6da15968b9e9fb4019675d959c7f317a4e154cec3abfcd435c077e - Sigstore transparency entry: 955950108
- Sigstore integration time:
-
Permalink:
MrT3313/Parse-ImageNet@61f0debac2c2f4c3d7c534ee335617edf2653d4c -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/MrT3313
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@61f0debac2c2f4c3d7c534ee335617edf2653d4c -
Trigger Event:
push
-
Statement type: