Skip to main content

Extract ImageNet image paths by category keywords

Project description

ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

PyPI Version Python Version License Downloads

Kaggle Competition Dataset

Prerequisites

  • Python 3.8+
  • ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
    ImageNet-Subset/
    ├── LOC_synset_mapping.txt
    ├── LOC_val_solution.csv
    └── ILSVRC/
        ├── ImageSets/
        │   └── CLS-LOC/
        │       ├── train_cls.txt
        │       └── val.txt
        └── Data/
            └── CLS-LOC/
                ├── train/
                │   ├── n01440764/
                │   │   ├── n01440764_10026.JPEG
                │   │   └── ...
                │   └── ...
                └── val/
                    ├── ILSVRC2012_val_00000001.JPEG
                    └── ...
    

Installation

pip install parseimagenet

For local development:

git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet

Usage

[!NOTE]

Example Notebook

Params

Parameter Type Default Alternatives Description
base_path Path - Any valid directory path Root path to the ImageNet dataset
preset str or None None "birds", "dogs", ... via get_available_presets() Predefined keyword list. None selects all categories
keywords list or None None Any list of strings Custom keyword list. Overrides preset when provided
num_images int 200 Any positive integer Max images to return (capped by availability)
source str "train" "val" Data split to sample from
silent bool True False Suppresses print output when enabled

Base Example

from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Default: no preset, selects from all categories
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])

Using Presets

[!NOTE]

Presets are predefined keyword lists for common categories:

from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds', 'dogs', 'wild_canids', 'snakes']

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

Using Keywords

[!NOTE]

Custom keywords override the preset:

[!IMPORTANT]

you can find all applicable category keywords in the LOC_synset_mapping.txt file

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)

Using Sources

By default, images are sourced from the training set. Use source="val" to pull from the validation set instead:

[!IMPORTANT]

we do not provide a fetch from the test data because the Kaggle Competition Dataset does not provide the ground truth for the training data.

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=100,
    source="val"
)

Command Line

# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords "dog, puppy" --num_images 100

# Use validation data instead of training data
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --source val --num_images 100

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parseimagenet-1.5.0.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parseimagenet-1.5.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file parseimagenet-1.5.0.tar.gz.

File metadata

  • Download URL: parseimagenet-1.5.0.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.5.0.tar.gz
Algorithm Hash digest
SHA256 579109edaed794fa0ae192b691e9e109787878ef46a560437e242093ffc64953
MD5 aeacc569e3161cb65d2e5fa604530c35
BLAKE2b-256 599472a9f08cd9dd40608f98fe07859177c846c2856464546914fedd6b179e58

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.5.0.tar.gz:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parseimagenet-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: parseimagenet-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0147a2dda6da15968b9e9fb4019675d959c7f317a4e154cec3abfcd435c077e
MD5 41fee355fdfc579871ce992b30041bc2
BLAKE2b-256 39aecb91b6933309f680bba878785a05a94b91cc3204b67a7b91fb5d6e518025

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.5.0-py3-none-any.whl:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page