Skip to main content

BioCodes - ISCC Processing for BioImaging Data

Project description

iscc-bio - ISCC Processing for Bioimage Data

Version License

ISCC Processing for Multi-Dimensional Bioimage Data

Generate ISO 24138:2024 International Standard Content Codes (ISCC) for bioimage data across multiple formats using deterministic IMAGEWALK plane traversal.

Project Status

Version 0.1.0

[!WARNING] This package is a proof of concept, and breaking changes may be released at any time.

Overview

iscc-bio bridges bioimage formats with ISCC-CODE processing by implementing the IMAGEWALK specification - a deterministic algorithm for traversing and canonicalizing pixel data from multi-dimensional bioimaging data. This produces consistent, reproducible content identifiers regardless of source format or storage platform.

Documentation: https://bio.iscc.codes

Key Features

  • Format-Agnostic Hashing: Generate reproducible ISCCs at the level of pixel data across OME-TIFF, OME-Zarr, OMERO, CZI, ND2, LIF, and other formats
  • IMAGEWALK Implementation: Deterministic Z→C→T plane traversal with canonical byte representation
  • Multi-Source Support: Process local files (via BioIO), OME-Zarr archives, and OMERO remote servers
  • Memory Efficient: Lazy loading with Dask for processing large multi-dimensional images
  • Multi-Scene Processing: Handle complex multi-scene/multi-series bioimage files
  • Command-Line Tools: CLI commands for code generation

Installation

Basic Installation

# Using uv (recommended)
uv tool install iscc-bio

# Using pip
pip install iscc-bio

Installation with Format Support

# Install with all bioimage reader plugins
uv tool install "iscc-bio[readers]"

# Install with specific format support
uv tool install "iscc-bio[czi,nd2,lif]"

# Install everything (all readers)
uv tool install "iscc-bio[all]"

OMERO Support

OMERO requires platform-specific prebuilt zeroc-ice wheels not available on PyPI. Install separately:

pip install -r requirements-omero.txt

Available Optional Dependencies

  • readers / all: All BioIO reader plugins (BioFormats, CZI, OME-TIFF, OME-Zarr, ND2, LIF, etc.)
  • bioformats: BioFormats reader for broad format support
  • czi, nd2, lif, ome-tiff, ome-zarr-plugin, dv, tifffile: Individual format readers

Quick Start

CLI Commands

Generate Biocode (ISCC-SUM)

# Generate biocode (ISCC-SUM) for bioimage scenes
iscc-bio biocode myimage.ome.tiff

# Works with multiple sources:
iscc-bio biocode local/file.czi           # Local bioimage file
iscc-bio biocode data.zarr                # OME-Zarr/NGFF
iscc-bio biocode --host omero.server.com --iid 123  # OMERO server

# With per-plane simprints for similarity search:
iscc-bio biocode myimage.czi --simprints

Generate Imagecode (Experimental)

# Generate comprehensive fingerprint with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED
iscc-bio imagecode myimage.czi

# Output includes:
# - ISCC-SUM hash over normalized pixel content
# - Representative view extraction (~5 views per scene)
# - ISCC-IMAGE codes for each view
# - ISCC-MIXED global descriptor

Extract Representative Views

# Extract intelligent 2D views for perceptual hashing
iscc-bio views myimage.nd2 --output-dir ./views/

# Extraction strategies:
# - Maximum intensity projections (MIP)
# - Best focus planes
# - Representative sampling
# - Multi-channel composites

IMAGEWALK Specification

IMAGEWALK is a deterministic algorithm for traversing multi-dimensional bioimage data to produce format-agnostic, reproducible hash digests.

Core Principles

  1. Z→C→T Traversal Order: Planes are processed in deterministic order:

    • Outermost loop: Z dimension (depth/focal plane)
    • Middle loop: C dimension (channel)
    • Innermost loop: T dimension (time)
  2. Canonical Byte Representation: Each 2D plane is:

    • Flattened in row-major order (Y then X)
    • Encoded as big-endian bytes
    • Fed to a hash processor
  3. Multi-Scene Independence: Each scene/series is processed separately, producing one hash per scene

Example Traversal

For an image with Z=2, C=3, T=2 dimensions (12 total planes):

Plane 1:  z=0, c=0, t=0    Plane 7:  z=1, c=0, t=0
Plane 2:  z=0, c=0, t=1    Plane 8:  z=1, c=0, t=1
Plane 3:  z=0, c=1, t=0    Plane 9:  z=1, c=1, t=0
Plane 4:  z=0, c=1, t=1    Plane 10: z=1, c=1, t=1
Plane 5:  z=0, c=2, t=0    Plane 11: z=1, c=2, t=0
Plane 6:  z=0, c=2, t=1    Plane 12: z=1, c=2, t=1

Implementation Modules

  • iw_bioio.py: BioIO-based implementation for local files
  • iw_ngff.py: OME-NGFF/Zarr implementation using ome-zarr-py
  • iw_blitz.py: OMERO Blitz implementation for remote servers

All implementations produce identical hashes for identical pixel data, conforming to the IMAGEWALK specification.

Command-Line Interface

biocode - Generate Biocode (ISCC-SUM)

Generate biocode (ISCC-SUM) for bioimage scenes:

iscc-bio biocode INPUT [OPTIONS]

Options:
  -s, --source [auto|bioio|omero|zarr]  Data source type
  --simprints                           Generate per-plane simprints
  --host TEXT                           OMERO server hostname
  --iid INTEGER                         OMERO image ID
  --fid INTEGER                         OMERO fileset ID

imagecode - Generate Imagecode (Experimental)

Create comprehensive bioimage fingerprints with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED codes:

iscc-bio imagecode INPUT [OPTIONS]

Options:
  -o, --output-dir PATH    Save extracted view PNGs
  -n, --max-views INTEGER  Maximum views per scene (default: 5)

views - Extract Representative Views

Extract intelligent 2D views for perceptual hashing:

iscc-bio views INPUT [OPTIONS]

Options:
  -s, --strategies TEXT    View strategies (mip, best_focus, representative, composite)
  -n, --max-views INTEGER  Maximum views to extract (default: 8)
  -o, --output-dir PATH    Directory to save thumbnails
  --host TEXT              OMERO server hostname
  --iid INTEGER            OMERO image ID

scenes - Extract Scene Thumbnails

Extract thumbnails from all scenes in a multi-scene file:

iscc-bio scenes INPUT

thumb - Extract Thumbnail

Extract a single representative thumbnail from a bioimage:

iscc-bio thumb INPUT

Python API

IMAGEWALK Plane Iteration

from iscc_bio.imagewalk.iw_bioio import iter_planes_bioio
from iscc_bio.imagewalk.iw_ngff import iter_planes_ngff
from iscc_bio.imagewalk.iw_blitz import iter_planes_blitz

# Iterate over planes using BioIO
for plane in iter_planes_bioio("image.czi"):
    print(f"Scene {plane.scene_idx}, Z={plane.z_depth}, "
          f"C={plane.c_channel}, T={plane.t_time}")
    print(f"Shape: {plane.xy_array.shape}, dtype: {plane.xy_array.dtype}")

# Iterate over OME-Zarr planes
for plane in iter_planes_ngff("data.zarr"):
    # Process plane.xy_array (2D numpy array)
    pass

# Iterate over OMERO planes
from omero.gateway import BlitzGateway
conn = BlitzGateway("user", "pass", host="omero.server.com")
conn.connect()
image = conn.getObject("Image", 123)

for plane in iter_planes_blitz(image):
    # Process plane.xy_array
    pass
conn.close()

Generate Biocode (ISCC-SUM)

from iscc_bio.biocode import generate_biocode
from iscc_bio.imagewalk import iter_planes_bioio

# Generate biocode for all scenes
planes = iter_planes_bioio("image.lif")
results = generate_biocode(planes)
print(results[0]["iscc_code"])  # ISCC-SUM for first scene

Generate Imagecode (Experimental)

from iscc_bio.imagecode import generate_imagecode, format_output

# Generate comprehensive fingerprints (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
fingerprints = generate_imagecode("image.nd2", max_views=5)

# Format output
output = format_output(fingerprints, "image.nd2")
print(output)

Supported Formats

Via BioIO plugin ecosystem:

  • OME-TIFF/TIFF: Multi-page TIFF with OME-XML metadata
  • OME-Zarr/NGFF: Next-generation file format
  • OMERO: Remote server access via Blitz gateway
  • CZI: Carl Zeiss Image format
  • ND2: Nikon NIS-Elements format
  • LIF: Leica Image File format
  • DV: DeltaVision format
  • BioFormats: 150+ formats via Bio-Formats Java library

Development

Setup Development Environment

# Clone repository
git clone https://github.com/bio-codes/iscc-bio.git
cd iscc-bio

# Install with all dependencies
uv sync --extra all

# Run CLI during development
uv run iscc-bio --help

Development Commands

This project uses poethepoet for task automation:

# Format markdown files
uv run poe format-md

# Format code files
uv run poe format-code

# Build documentation
uv run poe docs-build

# Run all formatting and docs
uv run poe all

Architecture

Core Modules

  • iscc_bio.imagewalk: IMAGEWALK plane traversal implementations

    • iw_bioio.py: BioIO implementation
    • iw_ngff.py: OME-Zarr/NGFF implementation
    • iw_blitz.py: OMERO Blitz implementation
    • models.py: Plane data model
  • iscc_bio.biocode: Biocode (ISCC-SUM) generation from IMAGEWALK plane iterators

  • iscc_bio.imagecode: Imagecode generation (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)

  • iscc_bio.views: Intelligent view extraction strategies

  • iscc_bio.cli: Command-line interface

Design Principles

  1. Lazy Loading: Uses Dask arrays for memory-efficient processing of large images
  2. Format Agnostic: Identical processing logic across all formats via IMAGEWALK
  3. Deterministic: Reproducible hashes across platforms and formats
  4. Modular: Clean separation between traversal, canonicalization, and hashing

Funding

This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.

See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).

License

Apache License 2.0 - See LICENSE file for details.

Citation

If you use iscc-bio in your research, please cite:

@software{iscc_bio,
  title        = {bio-codes/iscc-bio: ISCC Processing for Bioimage Data},
  author       = {Pan, Titusz},
  year         = 2025,
  url          = {https://github.com/bio-codes/iscc-bio},
  note         = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
  version      = {0.1.0}
}

Related Projects

  • iscc-sum - Fast ISCC Data-Code and Instance-Code hashing
  • iscc-core - ISCC Core Algorithms
  • BioIO - Bioimage reading library
  • OME-Zarr - Next-generation file format implementation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_bio-0.1.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iscc_bio-0.1.0-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file iscc_bio-0.1.0.tar.gz.

File metadata

  • Download URL: iscc_bio-0.1.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iscc_bio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f62fbf658ac4c2ab29082e896e6d15bc06df9007b6891fde59bcdb58523f808
MD5 9ed935d5038e5275f723914958dd0a9f
BLAKE2b-256 590b100d59074eb209ff4cb7f0209574ff93c0921071e05518d968108bb2b065

See more details on using hashes here.

File details

Details for the file iscc_bio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iscc_bio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iscc_bio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b16e14c7aca5d60efb2d0a863b408756c57de4986de83e65145679b0fa1bc315
MD5 34a7f181f50d3ecd9a56a11d17fa9464
BLAKE2b-256 1433913e822c76016cb149975df2b8ce620f0adc59f179a5d7024c870fb40ec5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page