Skip to main content

Adaptive spatial tessellation for sub-cellular resolution transcriptomics

Project description

Kintsugi

Named after the Japanese art of repairing broken ceramics with gold — honouring boundaries rather than erasing them.

Kintsugi builds adaptive spatial regions from Visium HD and other regular-grid spatial transcriptomics data. It starts from a 2D bin lattice and returns larger spatial regions whose boundaries follow local changes in molecule density. The package sits between raw binned counts and downstream biological analysis: it produces region labels, region-level Pearson residuals, region sizes, depths, centroids, and a spatial adjacency graph.

Kintsugi does not do clustering, marker testing, plotting, or manuscript-specific analysis. Those choices stay downstream.

Installation

From PyPI (available upon publication)

pip install kintsugi-st

From GitHub

pip install git+https://github.com/cafferychen777/kintsugi.git

With AnnData support

pip install "kintsugi-st[anndata]"

Conda / Mamba (from source)

git clone https://github.com/cafferychen777/kintsugi.git
cd kintsugi
mamba env create -f environment.yml
mamba activate kintsugi

Docker

docker build -t kintsugi .
docker run --rm kintsugi                               # run demo
docker run --rm -it kintsugi python               # interactive
docker run --rm -v $(pwd)/data:/data kintsugi \
    python -c "import kintsugi; ..."              # mount data

Singularity / Apptainer (HPC)

singularity build kintsugi.sif Singularity.def
singularity exec kintsugi.sif python -c "import kintsugi"

# or, on systems that provide Apptainer:
apptainer build kintsugi.sif Singularity.def
apptainer exec kintsugi.sif python -c "import kintsugi"

System requirements

  • OS: Linux, macOS, Windows (any platform with Python support).
  • Python: 3.10, 3.11, or 3.12.
  • Dependencies: NumPy (>=1.24), SciPy (>=1.11), h5py (>=3.10), pandas (>=2.0), PyArrow (>=14.0). Kintsugi itself is pure Python and ships no compiled extension modules.
  • Install time: < 30 seconds on a standard machine with pip.
  • Hardware: No GPU required. 16 GB RAM is sufficient for most Visium HD samples; 64 GB recommended for very large grids (> 500k bins).

Quick start (30 seconds)

import kintsugi

grid = kintsugi.load_visium_hd_from_dir("sample_dir")
result = grid.tessellate()

print(kintsugi.tessellation_report(result, grid))

That's it. Three lines: load, tessellate, report.

One-command demo

Run the bundled demo on a synthetic dataset (no data download needed):

kintsugi-demo           # CLI entry point (after pip install)
python -m kintsugi.demo # module invocation

Expected output:

Kintsugi v0.1.0
=======================================================

1. Generating toy dataset...
   Grid: 60×60, 100 genes, 2,472 tissue bins
   Time: 0.02s

2. Running tessellation...
   Regions: 75
   Time: 0.12s

3. Diagnostic report:
── Kintsugi Tessellation Report ──────────────────────
  Regions              : 75
  Median UMI / region  : 167.0
  Density CV           : 0.3488
  Stationarity pass    : 84.0%
  Composition holdout  : -30.3192 nats/bin
  Density holdout      : -2.4633 nats/bin
  Composition ΔLL      : +3.3670 nats/bin
  Density ΔLL          : +0.4569 nats/bin
  Composition dominates: False
─────────────────────────────────────────────────────

4. Determinism check:
   Label hash (sha256[:16]): 3653373eec97b137

Total runtime: 0.16s
Done.

The label hash 3653373eec97b137 verifies deterministic output across platforms.

10x Space Ranger to Kintsugi tutorial

If your data comes from 10x Genomics Space Ranger, the directory typically looks like:

sample/
├── filtered_feature_bc_matrix.h5
└── spatial/
    └── tissue_positions.parquet   (or .csv)

Step 1: Load

import kintsugi

grid = kintsugi.load_visium_hd_from_dir("sample/")
print(f"Grid: {grid.rows}×{grid.cols}, {grid.n_genes} genes, {grid.mask.sum()} tissue bins")

If your files are in non-standard locations:

grid = kintsugi.load_visium_hd(
    "path/to/filtered_feature_bc_matrix.h5",
    "path/to/tissue_positions.parquet",
)

Step 2: Tessellate

result = grid.tessellate()

With custom parameters:

result = grid.tessellate(
    lag=2,                # variogram lag in bins (2 bins = 4 µm at 2 µm resolution)
    kappa=2.0,            # stationarity tolerance
    min_seed_distance=4,  # minimum seed separation in bins
    smooth_sigma=4.0,     # Gaussian smoothing for seed detection
)

Step 3: Inspect the result

result.labels      # (rows, cols) int32 — region labels, -1 outside tissue
result.residuals   # (K, G)  float64 — Pearson residuals
result.areas       # (K,)    float64 — bins per region
result.depths      # (K,)    float64 — total UMI per region
result.centroids   # (K, 2)  float64 — (row, col) centroids
result.adjacency   # (K, K)  sparse   — spatial adjacency graph
result.trace       # (rows, cols) float64 — boundary-tensor trace
result.n_regions   # int — number of regions

Step 4: Diagnostic report

report = kintsugi.tessellation_report(result, grid)
print(report)

The report automatically computes:

Metric What it measures
Region count Number of tessellated regions
Median UMI/region Depth distribution
Density CV Coefficient of variation of per-bin UMI density
Stationarity pass rate Fraction passing Poisson stationarity test
Composition holdout LL Held-out log-likelihood for multinomial composition
Density holdout LL Held-out log-likelihood for Poisson density
Composition/Density ΔLL Improvement over single-global-model null
Composition dominates Warning if boundaries are driven by composition, not density

Step 5: Export to AnnData (scverse integration)

adata = kintsugi.to_anndata(result, grid=grid, use_raw_counts=True)

# adata.X         = Pearson residuals
# adata.obs       = area, depth
# adata.obsm      = spatial centroids
# adata.obsp      = adjacency graph
# adata.layers    = raw aggregated counts

adata.write("tessellation.h5ad")

The AnnData object is directly usable with Scanpy, Squidpy, and other scverse tools for clustering, visualisation, and spatial analysis.

Parameters

All parameters have fixed defaults and are documented. There is no hidden tuning.

Parameter Default How to think about it
lag 2 Grid offset for directional semivariance. On a 2 µm Visium HD grid, lag=2 is a 4 µm offset. Larger values look at broader spatial variation.
kappa 2.0 Stationarity tolerance during region refinement (in SE units). Larger values allow broader regions.
min_seed_distance 4 Minimum distance between seed points in grid bins. Controls the spatial scale: larger = fewer, larger regions.
smooth_sigma 4.0 Gaussian sigma for smoothing the trace field before seed detection. Larger values favor smoother boundaries.

For very small toy examples, use smaller min_seed_distance and smooth_sigma values (the defaults target real Visium HD grids).

Performance

Measured on an Apple M1 Max (single thread, pure Python).

Dataset Grid Genes Tissue bins Regions Time Peak memory
Toy (synthetic) 60 × 60 100 2,472 75 0.3 s 4 MB
Medium (synthetic) 200 × 200 500 40,000 ~1,900 0.4 s 54 MB
Large (synthetic) 500 × 500 1,000 250,000 ~12,000 2.7 s 385 MB

Memory scales primarily with n_regions × n_genes (the dense residual matrix). Upstream gene filtering is the main memory lever for large datasets.

Input contract

Kintsugi operates on a normalized grid:

  • counts: SciPy sparse matrix with shape (rows * cols, genes).
  • rows, cols: dimensions of the 2D grid.
  • mask: optional boolean array with shape (rows, cols); True marks in-tissue bins.
  • Matrix rows are in row-major order: row r * cols + c corresponds to grid bin (r, c).
  • Count values must be finite and non-negative.

GridData is the package container for this contract.

Non-Visium inputs

For any regular-grid data where you have occupied-bin counts and coordinates:

grid = kintsugi.build_regular_grid(
    counts,         # sparse (n_occupied, genes)
    row_coords,     # 1D array of row indices
    col_coords,     # 1D array of column indices
    rows=R, cols=C, # grid extent
)
result = grid.tessellate()

API overview

Most users need only:

  • kintsugi.load_visium_hd_from_dir(...) — load from Space Ranger output
  • kintsugi.tessellate(...) — run the full pipeline
  • kintsugi.tessellation_report(...) — diagnostic report
  • kintsugi.to_anndata(...) — export to AnnData

Lower-level functions for advanced users:

  • directional_semivariance — variogram estimation
  • boundary_tensor — tensor eigendecomposition
  • adaptive_tessellation — watershed + stationarity refinement
  • aggregate_counts — region-level Pearson residuals
  • build_spatial_graph — adjacency from labels

Reproducibility

  • Deterministic: no random seeds, no stochastic algorithms. The same input always produces the same output (verified by SHA-256 hash in tests).
  • No hidden tuning: all parameters are explicit and documented.
  • Tested package surface: unit tests cover core algorithms, I/O, report, AnnData export, the demo dataset, and edge-case validation.
  • Coverage reporting: use the pytest-cov command below to reproduce the module-level coverage table.
  • CI on Python 3.10, 3.11, 3.12 via GitHub Actions.
  • Docker and Singularity images for containerised reproduction.

Testing

python -m pip install -e ".[dev]"
python -m ruff check kintsugi tests examples
python -m pytest --cov=kintsugi --cov-report=term-missing

License

Kintsugi is released under the MIT License.

Citation

If you use Kintsugi in your research, please cite:

[Citation will be added upon publication]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kintsugi_st-0.1.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kintsugi_st-0.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file kintsugi_st-0.1.0.tar.gz.

File metadata

  • Download URL: kintsugi_st-0.1.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for kintsugi_st-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cefd1a78fd29585b9712dce5963092492f575886f8750574d362957fc3e466ee
MD5 e5f6d0aabd53fd5a0b62db6291f2aef8
BLAKE2b-256 675fd9c3175d662e0b8d5eb13030513d731a684ab9ad191745e76333fd027db1

See more details on using hashes here.

File details

Details for the file kintsugi_st-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kintsugi_st-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for kintsugi_st-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40173fa723db4b6e07eaabe1a1d0e8c5dc41750757cb42606b062c53ccce769e
MD5 3bfdc9816445512ccb0e344bd43683c9
BLAKE2b-256 a8c46886850b7adaa7d2dca9059a182e9bb956b7dfe0e3169f9b8a8a1afcc196

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page