A tiny open bioimage dataset catalog with fetch and verify workflows.
Project description
OME-IRIS
OME-IRIS is an open bioimage dataset catalog for benchmarking image input/output (IO), transformations, metadata management, and bioimage-linked workflows.
We also provide a small Python package by the same name (ome_iris) to help fetch and validate the datasets in the catalog.
Inspired by both the classic iris.csv dataset and the iris of the eye that brings images into focus, OME-IRIS aims to provide a collection of reference datasets for evaluating interoperable bioimage data formats, tools, and workflows.
What this is
- A lightweight manifest catalog for small benchmark datasets
- A fetch + verify workflow with a single CLI
- LinkML-based schema definitions for dataset manifests
What this is not
- Not a data portal
- Not DVC-based
- Not a large-file git storage approach
- Not a full ontology or end-to-end benchmark system yet
Quick start
uv run ome-iris fetch --tier small
uv run ome-iris verify
uv run ome-iris export-rocrate --dataset nf1-cellpainting-shrunken
Fetch output modes:
uv run ome-iris fetch --tier small --verbose # show per-file labels + downloader progress
uv run ome-iris fetch --tier small --silent # suppress downloader progress output
What fetch does
High-level flow when you run ome-iris fetch:
- Loads dataset manifests from
--manifests-dir. - Applies optional filters (
--dataset,--tier). - Creates local dataset roots under
--data-dir/<source_identifier>/. - Writes
ro-crate-metadata.jsoninto each dataset root. - Iterates over each
filesentry:- for
kind: file: downloads the file URL (or skips if already present) - for
kind: directory: traverses/downloads directory contents (or extracts archive sources)
- for
- Reports a summary:
- downloaded count + item list
- skipped count + item list
- missing URLs
- failed downloads
Output layout example:
data/
NF1_cellpainting_data_shrunken/
ro-crate-metadata.json
profiles.parquet
images/
masks/
Local files are stored under ./data/ by default.
Each dataset directory also gets ro-crate-metadata.json with source/provenance metadata from the manifest.
To use another data directory:
uv run ome-iris fetch --data-dir /tmp/ome-iris-data
uv run ome-iris verify --data-dir /tmp/ome-iris-data
Add a dataset
- Add or update a dataset manifest and catalog metadata.
- Include source, formats, and file-level metadata.
- Run:
uv run ome-iris verify
Starter scaffolding command:
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --append-csv
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --include-directory-entry --directory-path images --archive-format zip
The command guesses a dataset id/name/formats, writes a starter YAML manifest, and prints a suggested datasets.csv row.
File entry patterns
source_identifieris required at the top level of each manifest.- All
files[].pathvalues are relative todata/<source_identifier>/. sha256is optional for file entries.- Use
kind: directoryto fetch everything under a directory source.- For GitHub tree URLs (
https://github.com/<owner>/<repo>/tree/<ref>/<path>), OME-IRIS traverses files under that subtree. - For local directory paths, OME-IRIS recursively copies files.
- For archive URLs, set
archive_format(ziportar) to extract an archive into the destination directory.
- For GitHub tree URLs (
Relationships
Use an optional top-level relationships list to describe links between dataset components.
from: source file path (must match afiles[].path)to: target file path (must match afiles[].path)type: relationship label (for examplelinks_to_images_by,links_to_masks_by,references_metadata)rocrate_predicate: explicit RO-Crate/JSON-LD predicate URI for export (required)via_columns(optional): explicit table columns used for linkingfilename_patterns(optional): standardized filename templates used by the relationshipderived_from_columns(optional): columns used when deriving one component from another (for example images -> masks)
Example:
files:
- path: profiles.parquet
- path: images
kind: directory
relationships:
- from: profiles.parquet
to: images
type: links_to_images_by
rocrate_predicate: http://schema.org/associatedMedia
Example directory entry:
files:
- path: jump-plate/images
kind: directory
archive_format: zip
url: https://example.org/jump-plate-images.zip
sha256: "" # optional
Custom metadata (first-class)
OME-IRIS supports custom metadata as a first-class field via custom_metadata objects at manifest, source, and file levels.
Rules:
custom_metadatamust be an object/map.- Keys must be strings.
- Values may be strings, numbers, booleans, null, lists, or nested objects.
Example:
id: jump-plate
source_identifier: JUMP_plate_BR00117006
name: JUMP plate BR00117006 (JUMP_plate_BR00117006) example
description: Plate-level cell painting benchmark subset.
tier: small
license: CC-BY-4.0
custom_metadata:
study: jump-cp
species: human
source:
repository: https://example.org/repo
path: datasets/JUMP_plate_BR00117006
url: https://example.org/repo/tree/main/datasets/JUMP_plate_BR00117006
formats: [csv, tiff]
files:
- path: profiles.csv
url: https://example.org/files/profiles.csv
sha256: "..."
custom_metadata:
role: profile_table
Why large files are not committed
Large image/profile files make repositories slow and fragile for contributors and CI. OME-IRIS tracks metadata and download locations, while actual data is fetched locally when needed.
Documentation
Build docs locally:
uv sync --group docs
uv run --frozen sphinx-build docs/src docs/build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ome_iris-0.0.3.tar.gz.
File metadata
- Download URL: ome_iris-0.0.3.tar.gz
- Upload date:
- Size: 109.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f09468c7ea8d7673c05e1d40b3ef528472c98ca371e8ae3c86ecc345c3692dee
|
|
| MD5 |
6ef86909c79e77e4bb2aa489387ff94e
|
|
| BLAKE2b-256 |
7604667572aafb95e73804e16c314fb4a0fee0ae39193f2ed45e8855ac052080
|
Provenance
The following attestation bundles were made for ome_iris-0.0.3.tar.gz:
Publisher:
publish-pypi.yml on d33bs/OME-IRIS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ome_iris-0.0.3.tar.gz -
Subject digest:
f09468c7ea8d7673c05e1d40b3ef528472c98ca371e8ae3c86ecc345c3692dee - Sigstore transparency entry: 1676549752
- Sigstore integration time:
-
Permalink:
d33bs/OME-IRIS@3c67272a338603d72fc54c369120e7e9ce79669d -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/d33bs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@3c67272a338603d72fc54c369120e7e9ce79669d -
Trigger Event:
release
-
Statement type:
File details
Details for the file ome_iris-0.0.3-py3-none-any.whl.
File metadata
- Download URL: ome_iris-0.0.3-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9df9c965f674f4cdba261a7e0e330290a036eb2e98ac3b55632830939775bd42
|
|
| MD5 |
aff6304f4e0b820f757ee38fa922a087
|
|
| BLAKE2b-256 |
381cee772389f19700c4bac2d7db407c73dba5dd110317043a0e74cd56fc3132
|
Provenance
The following attestation bundles were made for ome_iris-0.0.3-py3-none-any.whl:
Publisher:
publish-pypi.yml on d33bs/OME-IRIS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ome_iris-0.0.3-py3-none-any.whl -
Subject digest:
9df9c965f674f4cdba261a7e0e330290a036eb2e98ac3b55632830939775bd42 - Sigstore transparency entry: 1676549786
- Sigstore integration time:
-
Permalink:
d33bs/OME-IRIS@3c67272a338603d72fc54c369120e7e9ce79669d -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/d33bs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@3c67272a338603d72fc54c369120e7e9ce79669d -
Trigger Event:
release
-
Statement type: