Tiled processing of arbitrarily large images with globally consistent labels
Project description
patchworks
Tiled processing of arbitrarily large images — any image, any function.
┌──────┬──────┬──────┐ fn(tile) → labels ┌──────┬──────┬──────┐
│ tile │ tile │ tile │ ─────────────────────► │ 1 │ 2 │ 3 │
├──────┼──────┼──────┤ ├──────┼──────┼──────┤
│ tile │ tile │ tile │ │ 4 │ 5 │ 6 │ globally
├──────┼──────┼──────┤ ├──────┼──────┼──────┤ consistent
│ tile │ tile │ tile │ │ 7 │ 8 │ 9 │ labels
└──────┴──────┴──────┘ └──────┴──────┴──────┘
patchworks splits a large image into tiles, runs any callable on each tile in parallel, and merges the results into a globally consistent label array. It handles terabyte-scale images without loading them into memory.
Installation
pip install patchworks
Optional extras:
pip install "patchworks[gpu]" # GPU VRAM querying (nvidia-ml-py)
pip install "patchworks[cellpose]" # Cellpose plugin
pip install "patchworks[bioio]" # convert any image format to OME-ZARR
pip install "patchworks[imaris]" # convert Imaris .ims files to OME-ZARR
pip install "patchworks[napari]" # interactive napari viewer plugin
pip install "patchworks[all]" # Everything (except napari GUI)
bioioreads CZI/LIF/ND2/OME-TIFF/… The[bioio]extra bundles the common native readers (bioio-nd2,bioio-ome-tiff,bioio-czi,bioio-tifffile,bioio-lif) plusbioio-bioformats, the Bio-Formats catch-all reader (JVM).[imaris]adds native.imssupport (HDF5, no JVM). Physical pixel calibration is read from the input and written into the OME-ZARR.
Quick start — 5 lines
from patchworks import tile_process
def my_fn(tile):
from skimage.filters import threshold_otsu
from skimage.measure import label
return label(tile > threshold_otsu(tile)).astype("int32")
result = tile_process("image.zarr", my_fn, compute=True)
Done. result is a NumPy array of integer labels, same spatial shape as the
input, with globally unique IDs across all tiles.
With Cellpose
from patchworks import tile_process
from patchworks.plugins.cellpose import cellpose_fn
fn = cellpose_fn("cyto3", gpu=True, diameter=30)
tile_process(
"image.zarr",
fn,
tile_shape=(1, 2048, 2048), # one z-slice per tile
overlap=20, # gives boundary cells enough context
write_to="labels.zarr", # stream directly to disk — no RAM accumulation
progress=True,
)
With StarDist
from stardist.models import StarDist2D
from patchworks import tile_process
model = StarDist2D.from_pretrained("2D_versatile_fluo")
def stardist_fn(tile):
img = tile[0] if tile.ndim == 3 and tile.shape[0] == 1 else tile
norm = img.astype("float32") / (img.max() or 1)
labels, _ = model.predict_instances(norm)
return labels.astype("int32")[None] if tile.ndim == 3 else labels.astype("int32")
tile_process(
"image.zarr",
stardist_fn,
tile_shape=(1, 1024, 1024),
overlap=32,
write_to="labels.zarr",
progress=True,
)
With any function
import numpy as np
from scipy.ndimage import gaussian_filter
from skimage.measure import label
from patchworks import tile_process
def my_custom_fn(tile: np.ndarray) -> np.ndarray:
smoothed = gaussian_filter(tile.astype("float32"), sigma=1.5)
binary = smoothed > smoothed.mean()
return label(binary).astype("int32")
tile_process("image.zarr", my_custom_fn, tile_shape=(1, 512, 512))
Common patterns
Auto-size tiles from available memory
from patchworks import tile_process
tile_process("image.zarr", fn, tile_shape="auto", use_gpu=True)
Skip empty tiles (sparse volumes)
from patchworks import estimate_empty_tiles, tile_process
info = estimate_empty_tiles("image.zarr", tile_shape=(120, 697, 697))
print(f"{info['empty_fraction']:.0%} tiles are background — will be skipped")
tile_process(
"image.zarr",
fn,
tile_shape=(120, 697, 697),
skip_empty=True,
empty_threshold=info["threshold"],
write_to="labels.zarr",
)
Distributed cluster for GPU
from patchworks import make_local_cluster, tile_process
client, cluster = make_local_cluster(use_gpu=True)
try:
tile_process("image.zarr", fn, write_to="labels.zarr", progress=True)
finally:
client.close()
cluster.close()
Contiguous label numbering
# Labels are globally unique by default, but may be gappy (block-encoded IDs).
# sequential_labels=True does a linear relabel O(voxels) — not O(n_tiles²).
tile_process("image.zarr", fn, write_to="labels.zarr", sequential_labels=True)
Use only the merge step (bring your own tiling)
If you already have per-tile labels from your own pipeline, just call the merge step directly:
import dask.array as da
import numpy as np
from patchworks import merge_tile_labels
# Your own tiling + segmentation
image = da.from_zarr("image.zarr").rechunk((1, 1024, 1024))
labeled = image.map_blocks(
my_segment_fn, dtype="int32", meta=np.empty((0,) * image.ndim, dtype="int32")
)
merged = merge_tile_labels(labeled, write_to="labels.zarr", progress=True)
Or merge from a zarr store your pipeline already wrote:
from patchworks import merge_tile_labels
merged = merge_tile_labels(
"my_staged_labels.zarr",
input_component="raw_labels",
write_to="merged.zarr",
sequential_labels=True,
)
How tiling and merging work
See docs/how-it-works.md for a full explanation. Short version:
- Image is split into tiles (with optional overlap for boundary context).
- Your function is called independently on each tile. Dask handles parallelism and streaming — tiles are never all in memory at once.
- Each tile's labels are written to a temp zarr exactly once (the staging step — this prevents your function being called 3-4× per tile during merge).
- Thin slabs at each tile boundary are scanned for touching label pairs.
- scipy connected components on the pairs → relabeling lookup table.
- LUT applied to every tile in parallel → globally consistent labels.
The merge is zarr-native (no dask task graph), so it scales to thousands of tiles where the dask-image approach stalls.
Known pitfalls (and how patchworks avoids them)
| Pitfall | Symptom | How patchworks handles it |
|---|---|---|
| In-process Dask client | FutureCancelledError: lost dependencies |
Detected at startup, raises immediately with fix instructions |
| 3-4× fn recompute during merge | Cellpose runs 3× per tile | Staging writes labels once, merge reads from disk |
| O(n²) sequential relabelling | Graph construction hangs at 1000+ tiles | Linear post-pass O(voxels) via np.unique + LUT |
| Wrong overlap boundary | Output shape mismatch | Always uses boundary="none" |
| Persisting large arrays | Worker OOM | Never persists; keeps dask graph lazy and streams |
Documentation
Requirements
- Python ≥ 3.9
- dask[array], numpy, zarr, scipy
Optional:
psutil— accurate RAM sizing fortile_shape="auto"nvidia-ml-py— accurate GPU VRAM sizingtqdm— progress barscellpose— Cellpose plugin
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file patchworks-0.5.0.tar.gz.
File metadata
- Download URL: patchworks-0.5.0.tar.gz
- Upload date:
- Size: 49.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fe832dde3c8f6d7c9b0ea13f5dcefdadbee22d07ea08daae998e192e5f1a799
|
|
| MD5 |
65397eae95a2c12714e9ec58fc4efb65
|
|
| BLAKE2b-256 |
2fab90b97fad6f6bd629407777a0620f5ead5ec49e047edf960f9b068ad61bbb
|
Provenance
The following attestation bundles were made for patchworks-0.5.0.tar.gz:
Publisher:
release.yml on imcf/patchworks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
patchworks-0.5.0.tar.gz -
Subject digest:
7fe832dde3c8f6d7c9b0ea13f5dcefdadbee22d07ea08daae998e192e5f1a799 - Sigstore transparency entry: 1923028404
- Sigstore integration time:
-
Permalink:
imcf/patchworks@e1b39c432c04f0f9fca819c7f9432e1741e1d1c3 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/imcf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e1b39c432c04f0f9fca819c7f9432e1741e1d1c3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file patchworks-0.5.0-py3-none-any.whl.
File metadata
- Download URL: patchworks-0.5.0-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9546e45faa3b75bae2204cb580d1aa2c37bcfae48541f4e88fe7e8bd854d37ef
|
|
| MD5 |
a8feeb8c233ef9b49163c50d6658c5ad
|
|
| BLAKE2b-256 |
809b1cd87272421b9cddb6018b2dee9d1bfae3ec4d415a723e0d8bebc04408a8
|
Provenance
The following attestation bundles were made for patchworks-0.5.0-py3-none-any.whl:
Publisher:
release.yml on imcf/patchworks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
patchworks-0.5.0-py3-none-any.whl -
Subject digest:
9546e45faa3b75bae2204cb580d1aa2c37bcfae48541f4e88fe7e8bd854d37ef - Sigstore transparency entry: 1923028655
- Sigstore integration time:
-
Permalink:
imcf/patchworks@e1b39c432c04f0f9fca819c7f9432e1741e1d1c3 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/imcf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e1b39c432c04f0f9fca819c7f9432e1741e1d1c3 -
Trigger Event:
push
-
Statement type: