Skip to main content

A fast 3D binary thinning implementation using CUDA and PyTorch.

Project description

Binary Thinning 3D CUDA

This package provides a blazing fast, memory-efficient GPU implementation of 3D Binary Thinning (skeletonization) using CUDA and PyTorch.

It is based on the 3D thinning algorithm by Lee, Kashyap and Chu (1994), which uses Euler characteristic invariance and 26-connectivity checks to safely erode a 3D binary volume down to a 1-pixel wide skeleton without altering its fundamental topology.

Features

This implementation provides two topologically safe operating modes to suit your needs:

  1. Mode 0: GPU Subgrid 8-Color Parallel (mode=0, Default)
    • Speed: Extremely Fast (~300x speedup over CPU)
    • Behavior: Operates entirely on the GPU. It avoids race conditions by partitioning the image into an 8-color 3D checkerboard. It re-checks and deletes pixels of the same color in parallel because they are mathematically guaranteed not to touch each other.
    • Topology: Topologically Safe. Produces a mathematically valid skeleton. Note: Because the deletion order differs slightly from a strict CPU raster-scan, the exact pixel placement may differ very slightly from ITK (e.g. 0.003% difference), but the overall global topology is preserved perfectly.
  2. Mode 1: Hybrid CPU-GPU Sequential (mode=1)
    • Speed: Fast (~100x speedup over CPU)
    • Behavior: Calculates Euler invariance on the GPU in parallel, but performs the final 26-connectivity re-checks strictly sequentially on the CPU (using zero-overhead memory compaction and host-side sorting).
    • Topology: 100% Identical to ITK. Guaranteed to produce the exact same pixel output as standard sequential CPU implementations like itk.BinaryThinningImageFilter3D.

Installation

Prerequisites

  • Python 3.10+
  • PyTorch (with CUDA support)
  • A CUDA-capable GPU

Install from PyPI (Recommended)

You can install the package directly from PyPI. Note that since this contains CUDA C++ extensions, it will be compiled on your machine during installation.

pip install binary-thinning-3d-cuda

Install from Source (Advanced Users)

For development or to run benchmarks, you can install from the source:

git clone https://github.com/sychen52/binary_thinning_3d_cuda.git
cd binary_thinning_3d_cuda

# Standard install
pip install -e --no-build-isolation .

# Install with development dependencies (for running benchmarks)
pip install -e --no-build-isolation ".[dev]"

(Note: itk-thickness3d and SimpleITK are not hard dependencies. They are only included in the [dev] extras for the purpose of benchmarking and validating against the CPU implementation).

Usage

The input can be a 3D PyTorch uint8 (Byte) tensor located on either a CPU or CUDA device.

  • If the tensor is on a CUDA device, the operation is performed in-place.
  • If the tensor is on the CPU, it is automatically moved to the GPU for processing and copied back to the original CPU tensor in-place.

All non-zero values are treated as foreground (0 for background, >0 for foreground).

import torch
from binary_thinning_3d import binary_thinning

# Create or load a 3D binary mask (CPU or GPU)
tensor = torch.zeros((100, 100, 100), dtype=torch.uint8)
tensor[25:75, 25:75, 25:75] = 1 # Solid block

# 1. GPU Subgrid (Default, Max Speed, Topologically Safe)
# Modifies the tensor in-place (handles CPU<->GPU transfer automatically)
binary_thinning(tensor, mode=0)

# 2. Hybrid CPU-GPU (Exact ITK Match)
binary_thinning(tensor, mode=1)

Benchmark

The following benchmark was run on a (767, 512, 512) NIfTI volume (CT Airways Label) containing 451,530 foreground voxels.

The benchmark compares this CUDA implementation against itk.BinaryThinningImageFilter3D (which is run sequentially on the CPU). The CUDA timings include the time for CPU-to-GPU and GPU-to-CPU data transfers.

Method Output Voxel Count Time (Seconds) Speedup vs ITK Matches ITK CPU?
Mode 0 (GPU Subgrid) 4,286 0.42 s 331x Topologically equivalent
Mode 1 (Hybrid CPU) 4,281 1.38 s 101x Yes (100% Identical)
ITK (CPU Baseline) 4,281 139.90 s 1x Baseline

To reproduce these benchmarks yourself:

# Ensure you installed with dev dependencies: pip install -e ".[dev]"
python examples/process_nifti.py

(The script will cache the slow ITK result to disk on the first run, so subsequent runs finish instantly).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binary_thinning_3d_cuda-1.2.0.tar.gz (9.5 kB view details)

Uploaded Source

File details

Details for the file binary_thinning_3d_cuda-1.2.0.tar.gz.

File metadata

  • Download URL: binary_thinning_3d_cuda-1.2.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binary_thinning_3d_cuda-1.2.0.tar.gz
Algorithm Hash digest
SHA256 ae756fac05637b300d1b8e429a0605dcbf9ec7ebb9ec211a7e8edec1b7954c4c
MD5 40eee0f15fbc1818941f3c9af151e91a
BLAKE2b-256 b7de943c0a6fd7583b76d8db55baaddb2108772e7394f44bb5309fd66feebb16

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_thinning_3d_cuda-1.2.0.tar.gz:

Publisher: build_wheels.yml on sychen52/binary_thinning_3d_cuda

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page