Skip to main content

Download and access JUMP image datasets and metadata.

Project description

jump-image-datasets

jump-image-datasets provides packaged JUMP pilot metadata and utilities for downloading image files from metadata tables.

Install

Install from PyPI

pip install jump-image-datasets

Install from PyPI for stable, versioned releases.

Local development with uv

uv venv
uv sync --group test

Editable install

uv pip install -e .

Install from the GitHub repo with pip

pip install "git+https://github.com/WayScience/jump_image_data_downloader.git"

Install from GitHub if you want the latest unreleased changes.

Usage

from jump_image_datasets.jump_pilot import image_downloader, image_metadata

# Load packaged metadata parquet as a DataFrame.
metadata_df = image_metadata.load_metadata()

# Download a small subset.
summary = image_downloader.download_images_with_metadata(
    df=metadata_df.head(10),
    url_column="Metadata_FileUrl",
    default_output_dir="downloaded_jump_pilot_images",
    parallel=True,
    workers=8,
)
print(summary)

For a full runnable example, see docs/download_images_examples.ipynb.

Packaged metadata provenance

This repository ships a packaged metadata table at:

  • src/jump_image_datasets/jump_pilot/data/2020_11_04_CPJUMP1_all_plates.parquet

Why this file exists

The file is included so users can immediately load a stable JUMP pilot metadata table (via jump_image_datasets.jump_pilot.image_metadata) without requiring a separate data-fetch or preprocessing step.

How it was created

This parquet was generated from the JUMP Cell Painting Gallery using:

Upstream source pattern used by that notebook:

  • s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/workspace/load_data_csv/2020_11_04_CPJUMP1/*/load_data.csv

Transform summary

The generation workflow in 2.download_image_metadata.ipynb:

  • Lists all per-plate load_data.csv files for run 2020_11_04_CPJUMP1 (51 files in the captured run) from public S3 (anon=True).
  • Reads each plate CSV, appends provenance columns:
    • source_plate (plate ID parsed from path)
    • source_s3_path (full S3 CSV path)
  • Concatenates all plate tables into one DataFrame.
  • Reshapes channel URL columns from wide to long using melt:
    • URL columns become Metadata_ChannelURLName
    • URL values become Metadata_FileUrl
  • Adds normalized channel/stain annotations by mapping URL column names:
    • Metadata_ChannelName: ER, AGP, Mito, DNA, RNA, BF, HZ_BF, LZ_BF
    • Metadata_StainName: corresponding stain labels (or NA for brightfield channels)
  • Derives Metadata_Filename from the final path component of Metadata_FileUrl.
  • Writes parquet with index=False as data/2020_11_04_CPJUMP1_all_plates.parquet (captured shape: (1495400, 32)).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jump_image_datasets-0.1.0.tar.gz (14.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jump_image_datasets-0.1.0-py3-none-any.whl (14.3 MB view details)

Uploaded Python 3

File details

Details for the file jump_image_datasets-0.1.0.tar.gz.

File metadata

  • Download URL: jump_image_datasets-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jump_image_datasets-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c5beb2ffe80c6bd33dddac05dcabcaca924ed5aec24dc4c5f7cc2e1a20af86fc
MD5 05e3fa38899d8f75d7f3cde64325d14c
BLAKE2b-256 e4c69108a98616c6268d0c272fd3352cd7aa1f81e280a16c3b2d4ac383fa86b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for jump_image_datasets-0.1.0.tar.gz:

Publisher: publish.yml on WayScience/jump_image_data_downloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jump_image_datasets-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jump_image_datasets-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62e8f723de74d15ae772f80abc83e29a2fc70f9084aedbdf7a6e6e68927d14fb
MD5 dffe16d4838a3fab5c89bfe719d9e23b
BLAKE2b-256 d5a314184a56521eee46f9e35af2ea0faa21b05d6afe4d68036435d15a04a098

See more details on using hashes here.

Provenance

The following attestation bundles were made for jump_image_datasets-0.1.0-py3-none-any.whl:

Publisher: publish.yml on WayScience/jump_image_data_downloader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page