Extract ImageNet image paths by category keywords
Project description
ParseImageNet
Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.
Kaggle Dataset
Prerequisites
- Python 3.8+
- ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
ImageNet-Subset/ ├── LOC_synset_mapping.txt └── ILSVRC/ ├── ImageSets/ │ └── CLS-LOC/ │ └── train_cls.txt └── Data/ └── CLS-LOC/ └── train/ ├── n01440764/ │ ├── n01440764_10026.JPEG │ └── ... └── ...
Installation
pip install parseimagenet
For local development:
git clone https://github.com/MrT3313/Parse-ImageNet.git
pip install -e ./Parse-ImageNet
Usage
[!NOTE]
In Jupyter Lab / Jupyter Notebook
from pathlib import Path
from parseimagenet import get_image_paths_by_keywords
# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset
# Use the default "birds" preset
image_paths = get_image_paths_by_keywords(base_path=base_path)
# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])
Using Preset Keywords
Presets are predefined keyword lists for common categories:
from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers
# See available presets
print(get_available_presets()) # ['birds']
# Use a specific preset
image_paths = get_image_paths_by_keywords(
base_path=base_path,
preset="birds",
num_images=200
)
# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])
Using Custom Keywords
Custom keywords override the preset:
image_paths = get_image_paths_by_keywords(
base_path=base_path,
keywords=['dog', 'puppy', 'hound'],
num_images=100
)
[!NOTE]
you can find all applicable categories in the
LOC_synset_mapping.txtfile
Command Line
# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset
# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100
# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords dog puppy --num_images 100
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parseimagenet-1.0.5.tar.gz.
File metadata
- Download URL: parseimagenet-1.0.5.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba0788fbe8ca238d0808214875ab13cff3ef9a0c21c420c9a8b06a8c84b56664
|
|
| MD5 |
7f1a8226afa9a003520ac2b39b0c46c4
|
|
| BLAKE2b-256 |
c730520276f44d4c1f44326cbb0fa2d7fd832422489509d5a77585182b6b23b8
|
Provenance
The following attestation bundles were made for parseimagenet-1.0.5.tar.gz:
Publisher:
publish.yml on MrT3313/Parse-ImageNet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseimagenet-1.0.5.tar.gz -
Subject digest:
ba0788fbe8ca238d0808214875ab13cff3ef9a0c21c420c9a8b06a8c84b56664 - Sigstore transparency entry: 926923557
- Sigstore integration time:
-
Permalink:
MrT3313/Parse-ImageNet@a3d9233b667f75f0022b48b68c0210707a718ab6 -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/MrT3313
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a3d9233b667f75f0022b48b68c0210707a718ab6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file parseimagenet-1.0.5-py3-none-any.whl.
File metadata
- Download URL: parseimagenet-1.0.5-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55037bb1808d624d3647af986e4f456bc1639d19f3e25b0bb7a7b511dd8e6dab
|
|
| MD5 |
619b3e4f8ac1e9271aad82cb61dc7c08
|
|
| BLAKE2b-256 |
a94eb5387edb3dac23939c9053b019e1e05817c4b4812aaa4d3f48c2762b9737
|
Provenance
The following attestation bundles were made for parseimagenet-1.0.5-py3-none-any.whl:
Publisher:
publish.yml on MrT3313/Parse-ImageNet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseimagenet-1.0.5-py3-none-any.whl -
Subject digest:
55037bb1808d624d3647af986e4f456bc1639d19f3e25b0bb7a7b511dd8e6dab - Sigstore transparency entry: 926923558
- Sigstore integration time:
-
Permalink:
MrT3313/Parse-ImageNet@a3d9233b667f75f0022b48b68c0210707a718ab6 -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/MrT3313
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a3d9233b667f75f0022b48b68c0210707a718ab6 -
Trigger Event:
push
-
Statement type: