Extract ImageNet image paths by category keywords

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MrT3313

These details have not been verified by PyPI

Project description

ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

Kaggle Competition Dataset

Prerequisites

Python 3.8+

ImageNet dataset (or a subset) with the standard ILSVRC directory structure:

ImageNet-Subset/
├── LOC_synset_mapping.txt
├── LOC_val_solution.csv
└── ILSVRC/
    ├── ImageSets/
    │   └── CLS-LOC/
    │       ├── train_cls.txt
    │       └── val.txt
    └── Data/
        └── CLS-LOC/
            ├── train/
            │   ├── n01440764/
            │   │   ├── n01440764_10026.JPEG
            │   │   └── ...
            │   └── ...
            └── val/
                ├── ILSVRC2012_val_00000001.JPEG
                └── ...

Installation

pip install parseimagenet

For local development:

git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet

Usage

[!NOTE]

Example Notebook

Params

Parameter	Type	Default	Alternatives	Description
`base_path`	`Path`	-	Any valid directory path	Root path to the ImageNet dataset
`preset`	`str` or `None`	`None`	`"birds"`, `"dogs"`, ... via `get_available_presets()`	Predefined keyword list. `None` selects all categories
`keywords`	`list` or `None`	`None`	Any list of strings	Custom keyword list. Overrides `preset` when provided
`num_images`	`int`	`200`	Any positive integer	Max images to return (capped by availability)
`source`	`str`	`"train"`	`"val"`	Data split to sample from
`silent`	`bool`	`True`	`False`	Suppresses print output when enabled

Base Example

from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Default: no preset, selects from all categories
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])

Using Presets

[!NOTE]

Presets are predefined keyword lists for common categories:

from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds', 'dogs', 'wild_canids', 'snakes']

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

Using Keywords

[!NOTE]

Custom keywords override the preset:

[!IMPORTANT]

you can find all applicable category keywords in the LOC_synset_mapping.txt file

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)

Using Sources

By default, images are sourced from the training set. Use source="val" to pull from the validation set instead:

[!IMPORTANT]

we do not provide a fetch from the test data because the Kaggle Competition Dataset does not provide the ground truth for the training data.

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=100,
    source="val"
)

Command Line

# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords "dog, puppy" --num_images 100

# Use validation data instead of training data
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --source val --num_images 100

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MrT3313

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.5.0

Feb 16, 2026

1.2.0

Feb 10, 2026

1.0.7

Feb 9, 2026

1.0.5

Feb 7, 2026

1.0.4

Feb 6, 2026

1.0.3

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parseimagenet-1.5.0.tar.gz (20.4 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parseimagenet-1.5.0-py3-none-any.whl (26.0 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file parseimagenet-1.5.0.tar.gz.

File metadata

Download URL: parseimagenet-1.5.0.tar.gz
Upload date: Feb 16, 2026
Size: 20.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`579109edaed794fa0ae192b691e9e109787878ef46a560437e242093ffc64953`
MD5	`aeacc569e3161cb65d2e5fa604530c35`
BLAKE2b-256	`599472a9f08cd9dd40608f98fe07859177c846c2856464546914fedd6b179e58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.5.0.tar.gz:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parseimagenet-1.5.0.tar.gz
- Subject digest: 579109edaed794fa0ae192b691e9e109787878ef46a560437e242093ffc64953
- Sigstore transparency entry: 955950099
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: MrT3313/Parse-ImageNet@61f0debac2c2f4c3d7c534ee335617edf2653d4c
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/MrT3313
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@61f0debac2c2f4c3d7c534ee335617edf2653d4c
- Trigger Event: push

File details

Details for the file parseimagenet-1.5.0-py3-none-any.whl.

File metadata

Download URL: parseimagenet-1.5.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0147a2dda6da15968b9e9fb4019675d959c7f317a4e154cec3abfcd435c077e`
MD5	`41fee355fdfc579871ce992b30041bc2`
BLAKE2b-256	`39aecb91b6933309f680bba878785a05a94b91cc3204b67a7b91fb5d6e518025`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.5.0-py3-none-any.whl:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parseimagenet-1.5.0-py3-none-any.whl
- Subject digest: b0147a2dda6da15968b9e9fb4019675d959c7f317a4e154cec3abfcd435c077e
- Sigstore transparency entry: 955950108
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: MrT3313/Parse-ImageNet@61f0debac2c2f4c3d7c534ee335617edf2653d4c
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/MrT3313
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@61f0debac2c2f4c3d7c534ee335617edf2653d4c
- Trigger Event: push

parseimagenet 1.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ParseImageNet

Kaggle Competition Dataset

Prerequisites

Installation

Usage

Params

Base Example

Using Presets

Using Keywords

Using Sources

Command Line

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance