BioCodes - ISCC Processing for BioImaging Data

These details have not been verified by PyPI

Project links

Project description

iscc-bio - ISCC Processing for Bioimage Data

ISCC Processing for Multi-Dimensional Bioimage Data

Generate ISO 24138:2024 International Standard Content Codes (ISCC) for bioimage data across multiple formats using deterministic IMAGEWALK plane traversal.

Project Status

Version 0.1.0

[!WARNING] This package is a proof of concept, and breaking changes may be released at any time.

Overview

iscc-bio bridges bioimage formats with ISCC-CODE processing by implementing the IMAGEWALK specification - a deterministic algorithm for traversing and canonicalizing pixel data from multi-dimensional bioimaging data. This produces consistent, reproducible content identifiers regardless of source format or storage platform.

Documentation: https://bio.iscc.codes

Key Features

Format-Agnostic Hashing: Generate reproducible ISCCs at the level of pixel data across OME-TIFF, OME-Zarr, OMERO, CZI, ND2, LIF, and other formats
IMAGEWALK Implementation: Deterministic Z→C→T plane traversal with canonical byte representation
Multi-Source Support: Process local files (via BioIO), OME-Zarr archives, and OMERO remote servers
Memory Efficient: Lazy loading with Dask for processing large multi-dimensional images
Multi-Scene Processing: Handle complex multi-scene/multi-series bioimage files
Command-Line Tools: CLI commands for code generation

Installation

Basic Installation

# Using uv (recommended)
uv tool install iscc-bio

# Using pip
pip install iscc-bio

Installation with Format Support

# Install with all bioimage reader plugins
uv tool install "iscc-bio[readers]"

# Install with specific format support
uv tool install "iscc-bio[czi,nd2,lif]"

# Install everything (all readers)
uv tool install "iscc-bio[all]"

OMERO Support

OMERO requires platform-specific prebuilt zeroc-ice wheels not available on PyPI. Install separately:

pip install -r requirements-omero.txt

Available Optional Dependencies

readers / all: All BioIO reader plugins (BioFormats, CZI, OME-TIFF, OME-Zarr, ND2, LIF, etc.)
bioformats: BioFormats reader for broad format support
czi, nd2, lif, ome-tiff, ome-zarr-plugin, dv, tifffile: Individual format readers

Quick Start

CLI Commands

Generate Biocode (ISCC-SUM)

# Generate biocode (ISCC-SUM) for bioimage scenes
iscc-bio biocode myimage.ome.tiff

# Works with multiple sources:
iscc-bio biocode local/file.czi           # Local bioimage file
iscc-bio biocode data.zarr                # OME-Zarr/NGFF
iscc-bio biocode --host omero.server.com --iid 123  # OMERO server

# With per-plane simprints for similarity search:
iscc-bio biocode myimage.czi --simprints

Generate Imagecode (Experimental)

# Generate comprehensive fingerprint with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED
iscc-bio imagecode myimage.czi

# Output includes:
# - ISCC-SUM hash over normalized pixel content
# - Representative view extraction (~5 views per scene)
# - ISCC-IMAGE codes for each view
# - ISCC-MIXED global descriptor

Extract Representative Views

# Extract intelligent 2D views for perceptual hashing
iscc-bio views myimage.nd2 --output-dir ./views/

# Extraction strategies:
# - Maximum intensity projections (MIP)
# - Best focus planes
# - Representative sampling
# - Multi-channel composites

IMAGEWALK Specification

IMAGEWALK is a deterministic algorithm for traversing multi-dimensional bioimage data to produce format-agnostic, reproducible hash digests.

Core Principles

Z→C→T Traversal Order: Planes are processed in deterministic order:
- Outermost loop: Z dimension (depth/focal plane)
- Middle loop: C dimension (channel)
- Innermost loop: T dimension (time)
Canonical Byte Representation: Each 2D plane is:
- Flattened in row-major order (Y then X)
- Encoded as big-endian bytes
- Fed to a hash processor
Multi-Scene Independence: Each scene/series is processed separately, producing one hash per scene

Example Traversal

For an image with Z=2, C=3, T=2 dimensions (12 total planes):

Plane 1:  z=0, c=0, t=0    Plane 7:  z=1, c=0, t=0
Plane 2:  z=0, c=0, t=1    Plane 8:  z=1, c=0, t=1
Plane 3:  z=0, c=1, t=0    Plane 9:  z=1, c=1, t=0
Plane 4:  z=0, c=1, t=1    Plane 10: z=1, c=1, t=1
Plane 5:  z=0, c=2, t=0    Plane 11: z=1, c=2, t=0
Plane 6:  z=0, c=2, t=1    Plane 12: z=1, c=2, t=1

Implementation Modules

iw_bioio.py: BioIO-based implementation for local files
iw_ngff.py: OME-NGFF/Zarr implementation using ome-zarr-py
iw_blitz.py: OMERO Blitz implementation for remote servers

All implementations produce identical hashes for identical pixel data, conforming to the IMAGEWALK specification.

Command-Line Interface

`biocode` - Generate Biocode (ISCC-SUM)

Generate biocode (ISCC-SUM) for bioimage scenes:

iscc-bio biocode INPUT [OPTIONS]

Options:
  -s, --source [auto|bioio|omero|zarr]  Data source type
  --simprints                           Generate per-plane simprints
  --host TEXT                           OMERO server hostname
  --iid INTEGER                         OMERO image ID
  --fid INTEGER                         OMERO fileset ID

`imagecode` - Generate Imagecode (Experimental)

Create comprehensive bioimage fingerprints with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED codes:

iscc-bio imagecode INPUT [OPTIONS]

Options:
  -o, --output-dir PATH    Save extracted view PNGs
  -n, --max-views INTEGER  Maximum views per scene (default: 5)

`views` - Extract Representative Views

Extract intelligent 2D views for perceptual hashing:

iscc-bio views INPUT [OPTIONS]

Options:
  -s, --strategies TEXT    View strategies (mip, best_focus, representative, composite)
  -n, --max-views INTEGER  Maximum views to extract (default: 8)
  -o, --output-dir PATH    Directory to save thumbnails
  --host TEXT              OMERO server hostname
  --iid INTEGER            OMERO image ID

`scenes` - Extract Scene Thumbnails

Extract thumbnails from all scenes in a multi-scene file:

iscc-bio scenes INPUT

`thumb` - Extract Thumbnail

Extract a single representative thumbnail from a bioimage:

iscc-bio thumb INPUT

Python API

IMAGEWALK Plane Iteration

from iscc_bio.imagewalk.iw_bioio import iter_planes_bioio
from iscc_bio.imagewalk.iw_ngff import iter_planes_ngff
from iscc_bio.imagewalk.iw_blitz import iter_planes_blitz

# Iterate over planes using BioIO
for plane in iter_planes_bioio("image.czi"):
    print(f"Scene {plane.scene_idx}, Z={plane.z_depth}, "
          f"C={plane.c_channel}, T={plane.t_time}")
    print(f"Shape: {plane.xy_array.shape}, dtype: {plane.xy_array.dtype}")

# Iterate over OME-Zarr planes
for plane in iter_planes_ngff("data.zarr"):
    # Process plane.xy_array (2D numpy array)
    pass

# Iterate over OMERO planes
from omero.gateway import BlitzGateway
conn = BlitzGateway("user", "pass", host="omero.server.com")
conn.connect()
image = conn.getObject("Image", 123)

for plane in iter_planes_blitz(image):
    # Process plane.xy_array
    pass
conn.close()

Generate Biocode (ISCC-SUM)

from iscc_bio.biocode import generate_biocode
from iscc_bio.imagewalk import iter_planes_bioio

# Generate biocode for all scenes
planes = iter_planes_bioio("image.lif")
results = generate_biocode(planes)
print(results[0]["iscc_code"])  # ISCC-SUM for first scene

Generate Imagecode (Experimental)

from iscc_bio.imagecode import generate_imagecode, format_output

# Generate comprehensive fingerprints (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
fingerprints = generate_imagecode("image.nd2", max_views=5)

# Format output
output = format_output(fingerprints, "image.nd2")
print(output)

Supported Formats

Via BioIO plugin ecosystem:

OME-TIFF/TIFF: Multi-page TIFF with OME-XML metadata
OME-Zarr/NGFF: Next-generation file format
OMERO: Remote server access via Blitz gateway
CZI: Carl Zeiss Image format
ND2: Nikon NIS-Elements format
LIF: Leica Image File format
DV: DeltaVision format
BioFormats: 150+ formats via Bio-Formats Java library

Development

Setup Development Environment

# Clone repository
git clone https://github.com/bio-codes/iscc-bio.git
cd iscc-bio

# Install with all dependencies
uv sync --extra all

# Run CLI during development
uv run iscc-bio --help

Development Commands

This project uses poethepoet for task automation:

# Format markdown files
uv run poe format-md

# Format code files
uv run poe format-code

# Build documentation
uv run poe docs-build

# Run all formatting and docs
uv run poe all

Architecture

Core Modules

iscc_bio.imagewalk: IMAGEWALK plane traversal implementations
- iw_bioio.py: BioIO implementation
- iw_ngff.py: OME-Zarr/NGFF implementation
- iw_blitz.py: OMERO Blitz implementation
- models.py: Plane data model
iscc_bio.biocode: Biocode (ISCC-SUM) generation from IMAGEWALK plane iterators
iscc_bio.imagecode: Imagecode generation (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
iscc_bio.views: Intelligent view extraction strategies
iscc_bio.cli: Command-line interface

Design Principles

Lazy Loading: Uses Dask arrays for memory-efficient processing of large images
Format Agnostic: Identical processing logic across all formats via IMAGEWALK
Deterministic: Reproducible hashes across platforms and formats
Modular: Clean separation between traversal, canonicalization, and hashing

Funding

This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.

See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).

License

Apache License 2.0 - See LICENSE file for details.

Citation

If you use iscc-bio in your research, please cite:

@software{iscc_bio,
  title        = {bio-codes/iscc-bio: ISCC Processing for Bioimage Data},
  author       = {Pan, Titusz},
  year         = 2025,
  url          = {https://github.com/bio-codes/iscc-bio},
  note         = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
  version      = {0.1.0}
}

Related Projects

iscc-sum - Fast ISCC Data-Code and Instance-Code hashing
iscc-core - ISCC Core Algorithms
BioIO - Bioimage reading library
OME-Zarr - Next-generation file format implementation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_bio-0.1.0.tar.gz (50.4 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iscc_bio-0.1.0-py3-none-any.whl (50.0 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file iscc_bio-0.1.0.tar.gz.

File metadata

Download URL: iscc_bio-0.1.0.tar.gz
Upload date: May 12, 2026
Size: 50.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iscc_bio-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9f62fbf658ac4c2ab29082e896e6d15bc06df9007b6891fde59bcdb58523f808`
MD5	`9ed935d5038e5275f723914958dd0a9f`
BLAKE2b-256	`590b100d59074eb209ff4cb7f0209574ff93c0921071e05518d968108bb2b065`

See more details on using hashes here.

File details

Details for the file iscc_bio-0.1.0-py3-none-any.whl.

File metadata

Download URL: iscc_bio-0.1.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 50.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iscc_bio-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b16e14c7aca5d60efb2d0a863b408756c57de4986de83e65145679b0fa1bc315`
MD5	`34a7f181f50d3ecd9a56a11d17fa9464`
BLAKE2b-256	`1433913e822c76016cb149975df2b8ce620f0adc59f179a5d7024c870fb40ec5`

See more details on using hashes here.

iscc-bio 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

iscc-bio - ISCC Processing for Bioimage Data

Project Status

Overview

Key Features

Installation

Basic Installation

Installation with Format Support

OMERO Support

Available Optional Dependencies

Quick Start

CLI Commands

Generate Biocode (ISCC-SUM)

Generate Imagecode (Experimental)

Extract Representative Views

IMAGEWALK Specification

Core Principles

Example Traversal

Implementation Modules

Command-Line Interface

biocode - Generate Biocode (ISCC-SUM)

imagecode - Generate Imagecode (Experimental)

views - Extract Representative Views

scenes - Extract Scene Thumbnails

thumb - Extract Thumbnail

Python API

IMAGEWALK Plane Iteration

Generate Biocode (ISCC-SUM)

Generate Imagecode (Experimental)

Supported Formats

Development

Setup Development Environment

Development Commands

Architecture

Core Modules

Design Principles

Funding

License

Citation

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`biocode` - Generate Biocode (ISCC-SUM)

`imagecode` - Generate Imagecode (Experimental)

`views` - Extract Representative Views

`scenes` - Extract Scene Thumbnails

`thumb` - Extract Thumbnail