Skip to main content

Tiled processing of arbitrarily large images with globally consistent labels

Project description

patchworks

PyPI Python versions License: MIT Docs

Tiled processing of arbitrarily large images — any image, any function.

┌──────┬──────┬──────┐     fn(tile) → labels      ┌──────┬──────┬──────┐
│ tile │ tile │ tile │  ─────────────────────►    │  1   │  2   │  3   │
├──────┼──────┼──────┤                            ├──────┼──────┼──────┤
│ tile │ tile │ tile │                            │  4   │  5   │  6   │   globally
├──────┼──────┼──────┤                            ├──────┼──────┼──────┤   consistent
│ tile │ tile │ tile │                            │  7   │  8   │  9   │   labels
└──────┴──────┴──────┘                            └──────┴──────┴──────┘

patchworks splits a large image into tiles, runs any callable on each tile in parallel, and merges the results into a globally consistent label array. It handles terabyte-scale images without loading them into memory.


Installation

pip install patchworks

Optional extras:

pip install "patchworks[gpu]"      # GPU VRAM querying (nvidia-ml-py)
pip install "patchworks[cellpose]" # Cellpose plugin
pip install "patchworks[bioio]"    # convert any image format to OME-ZARR
pip install "patchworks[imaris]"   # convert Imaris .ims files to OME-ZARR
pip install "patchworks[napari]"   # interactive napari viewer plugin
pip install "patchworks[all]"      # Everything (except napari GUI)

bioio reads CZI/LIF/ND2/OME-TIFF/… The [bioio] extra bundles the common native readers (bioio-nd2, bioio-ome-tiff, bioio-czi, bioio-tifffile, bioio-lif) plus bioio-bioformats, the Bio-Formats catch-all reader (JVM). [imaris] adds native .ims support (HDF5, no JVM). Physical pixel calibration is read from the input and written into the OME-ZARR.


Quick start — 5 lines

from patchworks import tile_process


def my_fn(tile):
    from skimage.filters import threshold_otsu
    from skimage.measure import label

    return label(tile > threshold_otsu(tile)).astype("int32")


result = tile_process("image.zarr", my_fn, compute=True)

Done. result is a NumPy array of integer labels, same spatial shape as the input, with globally unique IDs across all tiles.


With Cellpose

from patchworks import tile_process
from patchworks.plugins.cellpose import cellpose_fn

fn = cellpose_fn("cyto3", gpu=True, diameter=30)

tile_process(
    "image.zarr",
    fn,
    tile_shape=(1, 2048, 2048),  # one z-slice per tile
    overlap=20,  # gives boundary cells enough context
    write_to="labels.zarr",  # stream directly to disk — no RAM accumulation
    progress=True,
)

With StarDist

from stardist.models import StarDist2D
from patchworks import tile_process

model = StarDist2D.from_pretrained("2D_versatile_fluo")


def stardist_fn(tile):
    img = tile[0] if tile.ndim == 3 and tile.shape[0] == 1 else tile
    norm = img.astype("float32") / (img.max() or 1)
    labels, _ = model.predict_instances(norm)
    return labels.astype("int32")[None] if tile.ndim == 3 else labels.astype("int32")


tile_process(
    "image.zarr",
    stardist_fn,
    tile_shape=(1, 1024, 1024),
    overlap=32,
    write_to="labels.zarr",
    progress=True,
)

With any function

import numpy as np
from scipy.ndimage import gaussian_filter
from skimage.measure import label
from patchworks import tile_process


def my_custom_fn(tile: np.ndarray) -> np.ndarray:
    smoothed = gaussian_filter(tile.astype("float32"), sigma=1.5)
    binary = smoothed > smoothed.mean()
    return label(binary).astype("int32")


tile_process("image.zarr", my_custom_fn, tile_shape=(1, 512, 512))

Common patterns

Auto-size tiles from available memory

from patchworks import tile_process

tile_process("image.zarr", fn, tile_shape="auto", use_gpu=True)

Skip empty tiles (sparse volumes)

from patchworks import estimate_empty_tiles, tile_process

info = estimate_empty_tiles("image.zarr", tile_shape=(120, 697, 697))
print(f"{info['empty_fraction']:.0%} tiles are background — will be skipped")

tile_process(
    "image.zarr",
    fn,
    tile_shape=(120, 697, 697),
    skip_empty=True,
    empty_threshold=info["threshold"],
    write_to="labels.zarr",
)

Distributed cluster for GPU

from patchworks import make_local_cluster, tile_process

client, cluster = make_local_cluster(use_gpu=True)
try:
    tile_process("image.zarr", fn, write_to="labels.zarr", progress=True)
finally:
    client.close()
    cluster.close()

Contiguous label numbering

# Labels are globally unique by default, but may be gappy (block-encoded IDs).
# sequential_labels=True does a linear relabel O(voxels) — not O(n_tiles²).
tile_process("image.zarr", fn, write_to="labels.zarr", sequential_labels=True)

Use only the merge step (bring your own tiling)

If you already have per-tile labels from your own pipeline, just call the merge step directly:

import dask.array as da
import numpy as np
from patchworks import merge_tile_labels

# Your own tiling + segmentation
image = da.from_zarr("image.zarr").rechunk((1, 1024, 1024))
labeled = image.map_blocks(
    my_segment_fn, dtype="int32", meta=np.empty((0,) * image.ndim, dtype="int32")
)

merged = merge_tile_labels(labeled, write_to="labels.zarr", progress=True)

Or merge from a zarr store your pipeline already wrote:

from patchworks import merge_tile_labels

merged = merge_tile_labels(
    "my_staged_labels.zarr",
    input_component="raw_labels",
    write_to="merged.zarr",
    sequential_labels=True,
)

How tiling and merging work

See docs/how-it-works.md for a full explanation. Short version:

  1. Image is split into tiles (with optional overlap for boundary context).
  2. Your function is called independently on each tile. Dask handles parallelism and streaming — tiles are never all in memory at once.
  3. Each tile's labels are written to a temp zarr exactly once (the staging step — this prevents your function being called 3-4× per tile during merge).
  4. Thin slabs at each tile boundary are scanned for touching label pairs.
  5. scipy connected components on the pairs → relabeling lookup table.
  6. LUT applied to every tile in parallel → globally consistent labels.

The merge is zarr-native (no dask task graph), so it scales to thousands of tiles where the dask-image approach stalls.


Known pitfalls (and how patchworks avoids them)

Pitfall Symptom How patchworks handles it
In-process Dask client FutureCancelledError: lost dependencies Detected at startup, raises immediately with fix instructions
3-4× fn recompute during merge Cellpose runs 3× per tile Staging writes labels once, merge reads from disk
O(n²) sequential relabelling Graph construction hangs at 1000+ tiles Linear post-pass O(voxels) via np.unique + LUT
Wrong overlap boundary Output shape mismatch Always uses boundary="none"
Persisting large arrays Worker OOM Never persists; keeps dask graph lazy and streams

Documentation


Requirements

  • Python ≥ 3.9
  • dask[array], numpy, zarr, scipy

Optional:

  • psutil — accurate RAM sizing for tile_shape="auto"
  • nvidia-ml-py — accurate GPU VRAM sizing
  • tqdm — progress bars
  • cellpose — Cellpose plugin

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patchworks-0.5.0.tar.gz (49.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

patchworks-0.5.0-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file patchworks-0.5.0.tar.gz.

File metadata

  • Download URL: patchworks-0.5.0.tar.gz
  • Upload date:
  • Size: 49.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for patchworks-0.5.0.tar.gz
Algorithm Hash digest
SHA256 7fe832dde3c8f6d7c9b0ea13f5dcefdadbee22d07ea08daae998e192e5f1a799
MD5 65397eae95a2c12714e9ec58fc4efb65
BLAKE2b-256 2fab90b97fad6f6bd629407777a0620f5ead5ec49e047edf960f9b068ad61bbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for patchworks-0.5.0.tar.gz:

Publisher: release.yml on imcf/patchworks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file patchworks-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: patchworks-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for patchworks-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9546e45faa3b75bae2204cb580d1aa2c37bcfae48541f4e88fe7e8bd854d37ef
MD5 a8feeb8c233ef9b49163c50d6658c5ad
BLAKE2b-256 809b1cd87272421b9cddb6018b2dee9d1bfae3ec4d415a723e0d8bebc04408a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for patchworks-0.5.0-py3-none-any.whl:

Publisher: release.yml on imcf/patchworks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page