Skip to main content

Extract ImageNet image paths by category keywords

Project description

ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

Python Version,https://img.shields.io/pypi/pyversions/parseimagenet License,https://img.shields.io/github/license/MrT3313/Parse-ImageNet Build Status,https://github.com/MrT3313/Parse-ImageNet/actions/workflows/main.yml/badge.svg

Kaggle Dataset

Prerequisites

  • Python 3.8+
  • ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
    ImageNet-Subset/
    ├── LOC_synset_mapping.txt
    └── ILSVRC/
        ├── ImageSets/
        │   └── CLS-LOC/
        │       └── train_cls.txt
        └── Data/
            └── CLS-LOC/
                └── train/
                    ├── n01440764/
                    │   ├── n01440764_10026.JPEG
                    │   └── ...
                    └── ...
    

Installation

Clone the repository:

git clone https://github.com/MrT3313/Parse-ImageNet.git

Then install the package into the environment where you run Jupyter:

# Using pip
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet

The -e flag installs in "editable" mode, so code changes are immediately available without reinstalling. However, changes to package metadata (version, dependencies) in pyproject.toml still require running pip install -e . again.

Usage

[!NOTE]

Example Notebook

In Jupyter Lab / Jupyter Notebook

from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Use the default "birds" preset
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])

Using Preset Keywords

Presets are predefined keyword lists for common categories:

from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds']

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])

Using Custom Keywords

Custom keywords override the preset:

image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)

[!NOTE]

you can find all applicable categories in the LOC_synset_mapping.txt file

Command Line

# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords dog puppy --num_images 100

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parseimagenet-1.0.3.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parseimagenet-1.0.3-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file parseimagenet-1.0.3.tar.gz.

File metadata

  • Download URL: parseimagenet-1.0.3.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.0.3.tar.gz
Algorithm Hash digest
SHA256 92ff08909fa1bc8dc432c1f805d24bd957576e8b7609444d77e9bd3edb5cee96
MD5 3f6f748acf4b112d1795e2ea3a455fb2
BLAKE2b-256 9db80165cd499a60a0c90d89e036010e49eccd58e23b3259b359a5981ebdca9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.0.3.tar.gz:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parseimagenet-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: parseimagenet-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parseimagenet-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fbd79bf8d08951b27731e2d5ccef6fa082ca5f41ef752b7e9f194ec67017b5dc
MD5 d31c7a17b913eca883737457ffe5956c
BLAKE2b-256 cefb502d841713b51df474c9aaa5b0c494fc7b9042a7e8a55c727b56b84627b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for parseimagenet-1.0.3-py3-none-any.whl:

Publisher: publish.yml on MrT3313/Parse-ImageNet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page