Build labeled image datasets from a plain-English prompt.
Project description
prompt2dataset
Build labeled image datasets from a plain-English prompt.
$ cd my-dataset
$ p2d add
What image dataset do you want to build? > bird species native to the Pacific Northwest
prompt2dataset resolves your description into subjects via Claude, fetches images from one or more sources, deduplicates, downloads, and writes a manifest.
Installation
pip install prompt2dataset
p2d add, review, and info work with this base install. Training
requires PyTorch. Install the CPU or CUDA extras depending on your hardware:
pip install "prompt2dataset[train]" # CPU
pip install "prompt2dataset[train-cuda]" # CUDA (installs matching torch/torchvision)
Setup
prompt2dataset needs an Anthropic API key. On first run it will prompt you and save the
key to a local .env file. Or set it yourself:
# .env
ANTHROPIC_API_KEY=sk-ant-...
P2D_CONTACT=you@example.com # included in API request headers per Wikimedia's policy
Usage
All commands operate on the current directory.
p2d add
Prompts for a dataset description, resolves subjects, and downloads images. Run it again in the same directory to fetch additional subjects without re-downloading what's already there.
$ mkdir pacific-northwest-birds && cd pacific-northwest-birds
$ p2d add
p2d review
Step through downloaded images and mark them valid or delete them.
$ p2d review
$ p2d review --misclassified # only images that a trained model got wrong
Keys: A accept, D delete, S skip, Q quit.
p2d info
Print dataset statistics and the subject list.
p2d train
Fine-tune a pretrained image classifier on the dataset. Uses torch-lr-finder to find a good learning rate automatically, then trains for N epochs and exports a TorchScript model.
$ p2d train
$ p2d train --model resnet50 --epochs 10
Options: --epochs, --val-split, --img-size, --model (mobilenet_v2, resnet18, resnet50).
Data sources
| Source | Best for |
|---|---|
| DuckDuckGo | Broad or niche subjects, recent events, pop culture |
| Wikimedia Commons | Well-documented subjects with Wikipedia articles |
| iNaturalist | Animals, plants, fungi - research-grade, taxonomy-tagged |
| Openverse | General subjects, scenes, cultural content |
None require an API key. Sources are selected interactively when you run p2d add.
Output layout
my-dataset/
american-robin/
american-robin_a3f1c8d2e9b4.jpg
...
stellers-jay/
...
.p2d/
manifest.json dataset metadata and item list
labels.csv filename, subject, source
subjects.json resolved subject list (cached)
model.pt TorchScript model (after p2d train)
labels.json class names in output order
report.json per-class precision/recall/F1
misclassified.json validation images the model got wrong
manifest.json is the authoritative record. Everything in .p2d/ is
generated and can be reconstructed.
Global flag
--debug enables verbose logging for all commands:
p2d --debug add
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompt2dataset-0.1.0.tar.gz.
File metadata
- Download URL: prompt2dataset-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16f295ee3fbf184ef4bf2d7030eb250760950bfdbf07d03bc958f3633bf1091b
|
|
| MD5 |
b147e83c547cd18d2cbccfda123c3ae6
|
|
| BLAKE2b-256 |
76d716dba4f9e96cd347cdc7ace168b3d0287d5b18abc0cb0680768d6716251f
|
File details
Details for the file prompt2dataset-0.1.0-py3-none-any.whl.
File metadata
- Download URL: prompt2dataset-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
311178c84030dac6041dc979ff261fb38467bb1ff64b9d5a504a09b2ab989e93
|
|
| MD5 |
f00b73eab56fd821f90838a98f674dbb
|
|
| BLAKE2b-256 |
9d6f70939455cecdaae899f379f9c636521301eeac83fdac7f89260e50d7cd98
|