Skip to main content

High-performance Macenko PCA stain deconvolution powered by Rust

Project description

Macenko PCA

Build Tests Lint PyPI - Version PyPI - Python Version


High-performance stain matrix estimation and colour deconvolution for histology images using the Macenko PCA method, with the compute-intensive core written in Rust via PyO3.

Supported platforms: Linux and macOS only. Windows is not supported due to differences in BLAS/LAPACK / OpenBLAS builds that can cause numerical and linking inconsistencies; to ensure reproducible numerical results we build and test only on Linux and macOS.

This implements the method described in:

Macenko, M. et al. "A method for normalizing histology slides for quantitative analysis." ISBI 2009.

Features

Feature Detail
Performance Core SVD, PCA projection, and angle binning run in compiled Rust with Rayon parallelism
Simple API Six functions cover the full workflow — estimate stain vectors, decompose, and reconstruct
NumPy native Accepts and returns standard NumPy arrays — no special data structures required
Precision-aware Pass float32 arrays to halve RAM usage — the dtype of your input controls which Rust code path runs
Platform support Built with maturin for Linux and macOS (x86_64 + arm64). Windows is not supported.

Quick Start

Installation

pip install macenko-pca

Full Workflow Example

import numpy as np
from macenko_pca import (
    rgb_separate_stains_macenko_pca,
    rgb_color_deconvolution,
    reconstruct_rgb,
)

# Load or create an RGB image (H×W×3, values in [0, 255])
im_rgb = np.random.rand(256, 256, 3) * 255.0

# 1. Estimate the 3×3 stain matrix from the image
stain_matrix = rgb_separate_stains_macenko_pca(im_rgb)
print("Stain matrix:\n", stain_matrix)

# 2. Decompose the image into per-stain concentration channels
concentrations = rgb_color_deconvolution(im_rgb, stain_matrix)

hematoxylin = concentrations[:, :, 0]
eosin = concentrations[:, :, 1]
residual = concentrations[:, :, 2]

# 3. Modify concentrations (e.g. isolate hematoxylin only)
concentrations_h_only = concentrations.copy()
concentrations_h_only[:, :, 1] = 0.0  # zero-out eosin
concentrations_h_only[:, :, 2] = 0.0  # zero-out residual

# 4. Reconstruct back to RGB
im_hematoxylin_only = reconstruct_rgb(concentrations_h_only, stain_matrix)

Step-by-Step API

from macenko_pca import (
    rgb_to_sda,
    separate_stains_macenko_pca,
    color_deconvolution,
    reconstruct_rgb,
)

# Convert RGB to SDA (Stain Density Absorbance) space
im_sda = rgb_to_sda(im_rgb)

# Estimate stain vectors from the SDA image
stain_matrix = separate_stains_macenko_pca(im_sda)

# Decompose SDA image into stain concentrations
concentrations = color_deconvolution(im_sda, stain_matrix)

# Reconstruct RGB from (possibly modified) concentrations
im_reconstructed = reconstruct_rgb(concentrations, stain_matrix)

Half the RAM — Use float32

# Simply cast your input to float32 — the Rust backend will use f32 throughout
im_rgb_f32 = im_rgb.astype(np.float32)
stain_matrix = rgb_separate_stains_macenko_pca(im_rgb_f32)       # f32
concentrations = rgb_color_deconvolution(im_rgb_f32, stain_matrix)  # f32
reconstructed = reconstruct_rgb(concentrations, stain_matrix)        # f32

Precision / dtype rules

The dtype of your input array controls which Rust code path is taken:

Input dtype Computation dtype Notes
float64 f64 Full precision (default for plain Python floats)
float32 f32 ≈ half the RAM — recommended when full precision is unnecessary
float16 f32 Promoted to f32 (no f16 LAPACK exists)
integer types f64 Promoted to f64 for backward compatibility

The return array's dtype always matches the computation dtype.

API Reference

Stain Matrix Estimation

rgb_separate_stains_macenko_pca(im_rgb, ...)

End-to-end: takes an RGB image (H, W, 3) and returns a (3, 3) stain matrix.

separate_stains_macenko_pca(im_sda, ...)

Lower-level: operates on an image already in SDA space.

Colour Conversion

rgb_to_sda(im_rgb, ...)

Converts an RGB image or matrix to SDA (stain-density-absorbance) space.

Colour Deconvolution (Applying Stain Vectors)

color_deconvolution(im_sda, stain_matrix)

Decomposes an SDA image into per-stain concentration channels using the inverse of the stain matrix. Each output channel i holds the concentration of stain i.

rgb_color_deconvolution(im_rgb, stain_matrix, bg_int=None)

Convenience wrapper that converts RGB → SDA → concentrations in a single call.

Reconstruction

reconstruct_rgb(concentrations, stain_matrix, bg_int=None)

Reconstructs an RGB image from stain concentrations and a stain matrix. Inverts the deconvolution: SDA = concentrations × Wᵀ, then converts SDA back to RGB. Useful for stain normalisation workflows where you modify concentration channels and then reconstruct.

See the full API documentation for parameter details.

Development Setup

Prerequisites: Python 3.9+, a Rust toolchain, and Hatch.

On Linux you also need OpenBLAS development headers (libopenblas-dev on Debian/Ubuntu). On macOS: brew install openblas pkg-config.

git clone https://github.com/LavLabInfrastructure/macenko-pca.git
cd macenko-pca
pip install maturin hatch
maturin develop --release

Optionally install pre-commit hooks:

pip install pre-commit
pre-commit install

Common Commands

Run these directly or use the provided Makefile shortcuts (e.g. make test, make lint).

Task Command
Build Rust extension in-place maturin develop --release
Run tests make test
Tests + coverage make cov
Lint Python hatch run lint:check
Format Python hatch run lint:format
Auto-fix lint hatch run lint:fix
Format + fix + lint hatch run lint:all
Type check hatch run types:check
Build docs hatch run docs:build-docs
Serve docs hatch run docs:serve-docs
Build wheel maturin build --release
Clean artifacts make clean
Rust lints make cargo-clippy
Rust tests make cargo-test

Docker

# Run tests via Docker
docker build --target maturin -t macenko-pca:maturin .
docker run --rm -e HATCH_ENV=test macenko-pca:maturin cov

# Production image (just the installed wheel)
docker build --target prod -t macenko-pca:prod .

Publishing to PyPI

This project uses trusted publishing (OIDC) — no API tokens or secrets are needed. The publish.yml workflow handles everything automatically.

One-Time Setup (PyPI)

  1. Go to https://pypi.org/manage/account/publishing/ (logged in as a maintainer of the lavlab org).
  2. Click "Add a new pending publisher" and fill in:
    • PyPI project name: macenko-pca
    • Owner: lavlab (the GitHub organisation)
    • Repository: macenko-pca
    • Workflow name: publish.yml
    • Environment name: pypi
  3. (Optional) Repeat on TestPyPI with environment name testpypi to enable dry-run publishes.

One-Time Setup (GitHub)

  1. In the repository settings, go to Environments.
  2. Create an environment called pypi.
    • Optionally add a protection rule requiring manual approval before publishing.
  3. (Optional) Create an environment called testpypi for test publishes.

How to Release

# 1. Bump the version in src/macenko_pca/__about__.py and Cargo.toml
# 2. Commit and tag
git add -A
git commit -m "release: v0.2.0"
git tag v0.2.0
git push && git push --tags

# 3. Create a GitHub Release from the tag (via the web UI or `gh` CLI)
gh release create v0.2.0 --generate-notes

Creating the release triggers publish.yml, which:

  1. Builds wheels on Linux (manylinux) and macOS (x86_64 + arm64). Windows builds are intentionally omitted due to platform LAPACK/BLAS inconsistencies that affect numerical reproducibility.
  2. Builds a source distribution.
  3. Publishes everything to PyPI via trusted publishing.

Testing a Publish (Without a Release)

You can manually trigger the workflow against TestPyPI:

  1. Go to Actions → Publish to PyPI → Run workflow.
  2. Select testpypi as the target.
  3. Verify the package at https://test.pypi.org/project/macenko-pca/.

Project Structure

macenko-pca/
├── src/
│   └── macenko_pca/            # Python package source
│       ├── __init__.py          # Public API & version export
│       ├── __about__.py         # Version string
│       ├── deconvolution.py     # Pythonic wrappers with dtype dispatch
│       └── py.typed             # PEP 561 marker
├── rust/                        # Rust source (compiled via PyO3 + maturin)
│   ├── lib.rs                   # PyO3 module entry point (f32 + f64 variants)
│   ├── float_trait.rs           # MacenkoFloat supertrait (f32/f64)
│   ├── color_conversion.rs      # RGB → SDA transform (generic)
│   ├── color_deconvolution.rs   # SDA → concentrations, RGB reconstruction
│   ├── complement_stain_matrix.rs
│   ├── linalg.rs                # SVD, magnitude, normalisation (generic)
│   ├── rgb_separate_stains_macenko_pca.rs
│   ├── separate_stains_macenko_pca.rs
│   └── utils.rs                 # Image ↔ matrix helpers
├── tests/
│   ├── conftest.py              # Shared pytest fixtures
│   └── test_deconvolution.py    # Library function tests (104 tests)
├── docs/                        # MkDocs source files
├── .github/
│   ├── workflows/
│   │   ├── build.yml            # CI: build wheels on every push/PR
│   │   ├── pytest.yml           # CI: run tests across Python versions
│   │   ├── lint.yml             # CI: ruff lint + format check
│   │   └── publish.yml          # CD: publish to PyPI on release
│   └── dependabot.yml           # Auto-update deps, actions, Docker, Cargo
├── Cargo.toml                   # Rust crate configuration
├── pyproject.toml               # Python project & tool config (maturin backend)
├── Dockerfile                   # Multi-stage build (maturin / dev / prod)
├── Makefile                     # Dev shortcuts
├── mkdocs.yml                   # Docs config
├── .pre-commit-config.yaml      # Pre-commit hooks
├── .editorconfig                # Editor consistency
└── .gitignore

Design Philosophy

This project follows a library-first approach:

  1. All logic lives in importable modules under src/macenko_pca/.
  2. Heavy computation is delegated to Rust (rust/) for maximum throughput — SVD via ndarray-linalg, parallelism via rayon. All Rust functions are generic over f32/f64 via the MacenkoFloat trait.
  3. The Python layer (deconvolution.py) detects the input array's dtype and dispatches to the appropriate typed Rust function, providing input validation and rich docstrings.
  4. Tests call library functions directly.

This keeps your code reusable whether it's called from another package, a Jupyter notebook, or a web API.

Contributing

See CONTRIBUTING.md for development guidelines.

License

macenko-pca is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macenko_pca-0.1.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

macenko_pca-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (449.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

macenko_pca-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file macenko_pca-0.1.0.tar.gz.

File metadata

  • Download URL: macenko_pca-0.1.0.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for macenko_pca-0.1.0.tar.gz
Algorithm Hash digest
SHA256 26da2176e44ce3b1249de20db85f546f451fe1addff55628732d949fec5322c3
MD5 a1bc3e5c0f6ccee80f8ae2f9d71eb121
BLAKE2b-256 0d5091c13781fdfd0debf0ad2d1238a401e1fafbf1f34af9199540b2f637ae93

See more details on using hashes here.

Provenance

The following attestation bundles were made for macenko_pca-0.1.0.tar.gz:

Publisher: publish.yml on laviolette-lab/macenko-pca

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file macenko_pca-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for macenko_pca-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0fba8880f8387ad5318b871b1707728230db199ce193b7289642b0f0e12ba715
MD5 577197529a4de4272013c5114d837edc
BLAKE2b-256 cdae4fa034fe404c831b93ab3a3aea68f09ba5a5842345738c26ea0498810c2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for macenko_pca-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on laviolette-lab/macenko-pca

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file macenko_pca-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for macenko_pca-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf86c261ec0c9c5cc24eb6f74043a921290ec9660747137b85d0e4468090199c
MD5 bc5e8b733ed81f62e1506cf9980c6cd7
BLAKE2b-256 86d4547dd47da7bac113a4d2fb5b0ee5b6873011555a5b4d34178a2fc6e96b1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for macenko_pca-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on laviolette-lab/macenko-pca

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page