Download and access JUMP image datasets and metadata.
Project description
jump-image-datasets
jump-image-datasets provides packaged JUMP pilot metadata and utilities for downloading image files from metadata tables.
Install
Install from PyPI
pip install jump-image-datasets
Install from PyPI for stable, versioned releases.
Local development with uv
uv venv
uv sync --group test
Editable install
uv pip install -e .
Install from the GitHub repo with pip
pip install "git+https://github.com/WayScience/jump_image_data_downloader.git"
Install from GitHub if you want the latest unreleased changes.
Usage
from jump_image_datasets.jump_pilot import image_downloader, image_metadata
# Load packaged metadata parquet as a DataFrame.
metadata_df = image_metadata.load_metadata()
# Download a small subset.
summary = image_downloader.download_images_with_metadata(
df=metadata_df.head(10),
url_column="Metadata_FileUrl",
default_output_dir="downloaded_jump_pilot_images",
parallel=True,
workers=8,
)
print(summary)
For a full runnable example, see docs/download_images_examples.ipynb.
Packaged metadata provenance
This repository ships a packaged metadata table at:
src/jump_image_datasets/jump_pilot/data/2020_11_04_CPJUMP1_all_plates.parquet
Why this file exists
The file is included so users can immediately load a stable JUMP pilot metadata table (via jump_image_datasets.jump_pilot.image_metadata) without requiring a separate data-fetch or preprocessing step.
How it was created
This parquet was generated from the JUMP Cell Painting Gallery using:
Upstream source pattern used by that notebook:
s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/workspace/load_data_csv/2020_11_04_CPJUMP1/*/load_data.csv
Transform summary
The generation workflow in 2.download_image_metadata.ipynb:
- Lists all per-plate
load_data.csvfiles for run2020_11_04_CPJUMP1(51 files in the captured run) from public S3 (anon=True). - Reads each plate CSV, appends provenance columns:
source_plate(plate ID parsed from path)source_s3_path(full S3 CSV path)
- Concatenates all plate tables into one DataFrame.
- Reshapes channel URL columns from wide to long using
melt:- URL columns become
Metadata_ChannelURLName - URL values become
Metadata_FileUrl
- URL columns become
- Adds normalized channel/stain annotations by mapping URL column names:
Metadata_ChannelName:ER,AGP,Mito,DNA,RNA,BF,HZ_BF,LZ_BFMetadata_StainName: corresponding stain labels (orNAfor brightfield channels)
- Derives
Metadata_Filenamefrom the final path component ofMetadata_FileUrl. - Writes parquet with
index=Falseasdata/2020_11_04_CPJUMP1_all_plates.parquet(captured shape:(1495400, 32)).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jump_image_datasets-0.1.0.tar.gz.
File metadata
- Download URL: jump_image_datasets-0.1.0.tar.gz
- Upload date:
- Size: 14.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5beb2ffe80c6bd33dddac05dcabcaca924ed5aec24dc4c5f7cc2e1a20af86fc
|
|
| MD5 |
05e3fa38899d8f75d7f3cde64325d14c
|
|
| BLAKE2b-256 |
e4c69108a98616c6268d0c272fd3352cd7aa1f81e280a16c3b2d4ac383fa86b9
|
Provenance
The following attestation bundles were made for jump_image_datasets-0.1.0.tar.gz:
Publisher:
publish.yml on WayScience/jump_image_data_downloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jump_image_datasets-0.1.0.tar.gz -
Subject digest:
c5beb2ffe80c6bd33dddac05dcabcaca924ed5aec24dc4c5f7cc2e1a20af86fc - Sigstore transparency entry: 1415220086
- Sigstore integration time:
-
Permalink:
WayScience/jump_image_data_downloader@efc6a84fb1c451a88f0db16aac59e92768ed9480 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/WayScience
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@efc6a84fb1c451a88f0db16aac59e92768ed9480 -
Trigger Event:
release
-
Statement type:
File details
Details for the file jump_image_datasets-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jump_image_datasets-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62e8f723de74d15ae772f80abc83e29a2fc70f9084aedbdf7a6e6e68927d14fb
|
|
| MD5 |
dffe16d4838a3fab5c89bfe719d9e23b
|
|
| BLAKE2b-256 |
d5a314184a56521eee46f9e35af2ea0faa21b05d6afe4d68036435d15a04a098
|
Provenance
The following attestation bundles were made for jump_image_datasets-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on WayScience/jump_image_data_downloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jump_image_datasets-0.1.0-py3-none-any.whl -
Subject digest:
62e8f723de74d15ae772f80abc83e29a2fc70f9084aedbdf7a6e6e68927d14fb - Sigstore transparency entry: 1415220200
- Sigstore integration time:
-
Permalink:
WayScience/jump_image_data_downloader@efc6a84fb1c451a88f0db16aac59e92768ed9480 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/WayScience
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@efc6a84fb1c451a88f0db16aac59e92768ed9480 -
Trigger Event:
release
-
Statement type: