BioCodes - ISCC Processing for BioImaging Data
Project description
iscc-bio - ISCC Processing for Bioimage Data
ISCC Processing for Multi-Dimensional Bioimage Data
Generate ISO 24138:2024 International Standard Content Codes (ISCC) for bioimage data across multiple formats using deterministic IMAGEWALK plane traversal.
Project Status
Version 0.1.0
[!WARNING] This package is a proof of concept, and breaking changes may be released at any time.
Overview
iscc-bio bridges bioimage formats with ISCC-CODE processing by implementing the IMAGEWALK specification -
a deterministic algorithm for traversing and canonicalizing pixel data from multi-dimensional bioimaging data.
This produces consistent, reproducible content identifiers regardless of source format or storage platform.
Documentation: https://bio.iscc.codes
Key Features
- Format-Agnostic Hashing: Generate reproducible ISCCs at the level of pixel data across OME-TIFF, OME-Zarr, OMERO, CZI, ND2, LIF, and other formats
- IMAGEWALK Implementation: Deterministic Z→C→T plane traversal with canonical byte representation
- Multi-Source Support: Process local files (via BioIO), OME-Zarr archives, and OMERO remote servers
- Memory Efficient: Lazy loading with Dask for processing large multi-dimensional images
- Multi-Scene Processing: Handle complex multi-scene/multi-series bioimage files
- Command-Line Tools: CLI commands for code generation
Installation
Basic Installation
# Using uv (recommended)
uv tool install iscc-bio
# Using pip
pip install iscc-bio
Installation with Format Support
# Install with all bioimage reader plugins
uv tool install "iscc-bio[readers]"
# Install with specific format support
uv tool install "iscc-bio[czi,nd2,lif]"
# Install everything (all readers)
uv tool install "iscc-bio[all]"
OMERO Support
OMERO requires platform-specific prebuilt zeroc-ice wheels not available on PyPI. Install separately:
pip install -r requirements-omero.txt
Available Optional Dependencies
- readers / all: All BioIO reader plugins (BioFormats, CZI, OME-TIFF, OME-Zarr, ND2, LIF, etc.)
- bioformats: BioFormats reader for broad format support
- czi, nd2, lif, ome-tiff, ome-zarr-plugin, dv, tifffile: Individual format readers
Quick Start
CLI Commands
Generate Biocode (ISCC-SUM)
# Generate biocode (ISCC-SUM) for bioimage scenes
iscc-bio biocode myimage.ome.tiff
# Works with multiple sources:
iscc-bio biocode local/file.czi # Local bioimage file
iscc-bio biocode data.zarr # OME-Zarr/NGFF
iscc-bio biocode --host omero.server.com --iid 123 # OMERO server
# With per-plane simprints for similarity search:
iscc-bio biocode myimage.czi --simprints
Generate Imagecode (Experimental)
# Generate comprehensive fingerprint with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED
iscc-bio imagecode myimage.czi
# Output includes:
# - ISCC-SUM hash over normalized pixel content
# - Representative view extraction (~5 views per scene)
# - ISCC-IMAGE codes for each view
# - ISCC-MIXED global descriptor
Extract Representative Views
# Extract intelligent 2D views for perceptual hashing
iscc-bio views myimage.nd2 --output-dir ./views/
# Extraction strategies:
# - Maximum intensity projections (MIP)
# - Best focus planes
# - Representative sampling
# - Multi-channel composites
IMAGEWALK Specification
IMAGEWALK is a deterministic algorithm for traversing multi-dimensional bioimage data to produce format-agnostic, reproducible hash digests.
Core Principles
-
Z→C→T Traversal Order: Planes are processed in deterministic order:
- Outermost loop: Z dimension (depth/focal plane)
- Middle loop: C dimension (channel)
- Innermost loop: T dimension (time)
-
Canonical Byte Representation: Each 2D plane is:
- Flattened in row-major order (Y then X)
- Encoded as big-endian bytes
- Fed to a hash processor
-
Multi-Scene Independence: Each scene/series is processed separately, producing one hash per scene
Example Traversal
For an image with Z=2, C=3, T=2 dimensions (12 total planes):
Plane 1: z=0, c=0, t=0 Plane 7: z=1, c=0, t=0
Plane 2: z=0, c=0, t=1 Plane 8: z=1, c=0, t=1
Plane 3: z=0, c=1, t=0 Plane 9: z=1, c=1, t=0
Plane 4: z=0, c=1, t=1 Plane 10: z=1, c=1, t=1
Plane 5: z=0, c=2, t=0 Plane 11: z=1, c=2, t=0
Plane 6: z=0, c=2, t=1 Plane 12: z=1, c=2, t=1
Implementation Modules
iw_bioio.py: BioIO-based implementation for local filesiw_ngff.py: OME-NGFF/Zarr implementation using ome-zarr-pyiw_blitz.py: OMERO Blitz implementation for remote servers
All implementations produce identical hashes for identical pixel data, conforming to the IMAGEWALK specification.
Command-Line Interface
biocode - Generate Biocode (ISCC-SUM)
Generate biocode (ISCC-SUM) for bioimage scenes:
iscc-bio biocode INPUT [OPTIONS]
Options:
-s, --source [auto|bioio|omero|zarr] Data source type
--simprints Generate per-plane simprints
--host TEXT OMERO server hostname
--iid INTEGER OMERO image ID
--fid INTEGER OMERO fileset ID
imagecode - Generate Imagecode (Experimental)
Create comprehensive bioimage fingerprints with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED codes:
iscc-bio imagecode INPUT [OPTIONS]
Options:
-o, --output-dir PATH Save extracted view PNGs
-n, --max-views INTEGER Maximum views per scene (default: 5)
views - Extract Representative Views
Extract intelligent 2D views for perceptual hashing:
iscc-bio views INPUT [OPTIONS]
Options:
-s, --strategies TEXT View strategies (mip, best_focus, representative, composite)
-n, --max-views INTEGER Maximum views to extract (default: 8)
-o, --output-dir PATH Directory to save thumbnails
--host TEXT OMERO server hostname
--iid INTEGER OMERO image ID
scenes - Extract Scene Thumbnails
Extract thumbnails from all scenes in a multi-scene file:
iscc-bio scenes INPUT
thumb - Extract Thumbnail
Extract a single representative thumbnail from a bioimage:
iscc-bio thumb INPUT
Python API
IMAGEWALK Plane Iteration
from iscc_bio.imagewalk.iw_bioio import iter_planes_bioio
from iscc_bio.imagewalk.iw_ngff import iter_planes_ngff
from iscc_bio.imagewalk.iw_blitz import iter_planes_blitz
# Iterate over planes using BioIO
for plane in iter_planes_bioio("image.czi"):
print(f"Scene {plane.scene_idx}, Z={plane.z_depth}, "
f"C={plane.c_channel}, T={plane.t_time}")
print(f"Shape: {plane.xy_array.shape}, dtype: {plane.xy_array.dtype}")
# Iterate over OME-Zarr planes
for plane in iter_planes_ngff("data.zarr"):
# Process plane.xy_array (2D numpy array)
pass
# Iterate over OMERO planes
from omero.gateway import BlitzGateway
conn = BlitzGateway("user", "pass", host="omero.server.com")
conn.connect()
image = conn.getObject("Image", 123)
for plane in iter_planes_blitz(image):
# Process plane.xy_array
pass
conn.close()
Generate Biocode (ISCC-SUM)
from iscc_bio.biocode import generate_biocode
from iscc_bio.imagewalk import iter_planes_bioio
# Generate biocode for all scenes
planes = iter_planes_bioio("image.lif")
results = generate_biocode(planes)
print(results[0]["iscc_code"]) # ISCC-SUM for first scene
Generate Imagecode (Experimental)
from iscc_bio.imagecode import generate_imagecode, format_output
# Generate comprehensive fingerprints (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
fingerprints = generate_imagecode("image.nd2", max_views=5)
# Format output
output = format_output(fingerprints, "image.nd2")
print(output)
Supported Formats
Via BioIO plugin ecosystem:
- OME-TIFF/TIFF: Multi-page TIFF with OME-XML metadata
- OME-Zarr/NGFF: Next-generation file format
- OMERO: Remote server access via Blitz gateway
- CZI: Carl Zeiss Image format
- ND2: Nikon NIS-Elements format
- LIF: Leica Image File format
- DV: DeltaVision format
- BioFormats: 150+ formats via Bio-Formats Java library
Development
Setup Development Environment
# Clone repository
git clone https://github.com/bio-codes/iscc-bio.git
cd iscc-bio
# Install with all dependencies
uv sync --extra all
# Run CLI during development
uv run iscc-bio --help
Development Commands
This project uses poethepoet for task automation:
# Format markdown files
uv run poe format-md
# Format code files
uv run poe format-code
# Build documentation
uv run poe docs-build
# Run all formatting and docs
uv run poe all
Architecture
Core Modules
-
iscc_bio.imagewalk: IMAGEWALK plane traversal implementationsiw_bioio.py: BioIO implementationiw_ngff.py: OME-Zarr/NGFF implementationiw_blitz.py: OMERO Blitz implementationmodels.py: Plane data model
-
iscc_bio.biocode: Biocode (ISCC-SUM) generation from IMAGEWALK plane iterators -
iscc_bio.imagecode: Imagecode generation (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED) -
iscc_bio.views: Intelligent view extraction strategies -
iscc_bio.cli: Command-line interface
Design Principles
- Lazy Loading: Uses Dask arrays for memory-efficient processing of large images
- Format Agnostic: Identical processing logic across all formats via IMAGEWALK
- Deterministic: Reproducible hashes across platforms and formats
- Modular: Clean separation between traversal, canonicalization, and hashing
Funding
This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.
See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).
License
Apache License 2.0 - See LICENSE file for details.
Citation
If you use iscc-bio in your research, please cite:
@software{iscc_bio,
title = {bio-codes/iscc-bio: ISCC Processing for Bioimage Data},
author = {Pan, Titusz},
year = 2025,
url = {https://github.com/bio-codes/iscc-bio},
note = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
version = {0.1.0}
}
Related Projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iscc_bio-0.1.0.tar.gz.
File metadata
- Download URL: iscc_bio-0.1.0.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f62fbf658ac4c2ab29082e896e6d15bc06df9007b6891fde59bcdb58523f808
|
|
| MD5 |
9ed935d5038e5275f723914958dd0a9f
|
|
| BLAKE2b-256 |
590b100d59074eb209ff4cb7f0209574ff93c0921071e05518d968108bb2b065
|
File details
Details for the file iscc_bio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iscc_bio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b16e14c7aca5d60efb2d0a863b408756c57de4986de83e65145679b0fa1bc315
|
|
| MD5 |
34a7f181f50d3ecd9a56a11d17fa9464
|
|
| BLAKE2b-256 |
1433913e822c76016cb149975df2b8ce620f0adc59f179a5d7024c870fb40ec5
|