Adaptive spatial tessellation for sub-cellular resolution transcriptomics
Project description
Kintsugi
Named after the Japanese art of repairing broken ceramics with gold — honouring boundaries rather than erasing them.
Kintsugi builds adaptive spatial regions from Visium HD and other regular-grid spatial transcriptomics data. It starts from a 2D bin lattice and returns larger spatial regions whose boundaries follow local changes in molecule density. The package sits between raw binned counts and downstream biological analysis: it produces region labels, region-level Pearson residuals, region sizes, depths, centroids, and a spatial adjacency graph.
Kintsugi does not do clustering, marker testing, plotting, or manuscript-specific analysis. Those choices stay downstream.
Installation
From PyPI (available upon publication)
pip install kintsugi-st
From GitHub
pip install git+https://github.com/cafferychen777/kintsugi.git
With AnnData support
pip install "kintsugi-st[anndata]"
Conda / Mamba (from source)
git clone https://github.com/cafferychen777/kintsugi.git
cd kintsugi
mamba env create -f environment.yml
mamba activate kintsugi
Docker
docker build -t kintsugi .
docker run --rm kintsugi # run demo
docker run --rm -it kintsugi python # interactive
docker run --rm -v $(pwd)/data:/data kintsugi \
python -c "import kintsugi; ..." # mount data
Singularity / Apptainer (HPC)
singularity build kintsugi.sif Singularity.def
singularity exec kintsugi.sif python -c "import kintsugi"
# or, on systems that provide Apptainer:
apptainer build kintsugi.sif Singularity.def
apptainer exec kintsugi.sif python -c "import kintsugi"
System requirements
- OS: Linux, macOS, Windows (any platform with Python support).
- Python: 3.10, 3.11, or 3.12.
- Dependencies: NumPy (>=1.24), SciPy (>=1.11), h5py (>=3.10), pandas (>=2.0), PyArrow (>=14.0). Kintsugi itself is pure Python and ships no compiled extension modules.
- Install time: < 30 seconds on a standard machine with pip.
- Hardware: No GPU required. 16 GB RAM is sufficient for most Visium HD samples; 64 GB recommended for very large grids (> 500k bins).
Quick start (30 seconds)
import kintsugi
grid = kintsugi.load_visium_hd_from_dir("sample_dir")
result = grid.tessellate()
print(kintsugi.tessellation_report(result, grid))
That's it. Three lines: load, tessellate, report.
One-command demo
Run the bundled demo on a synthetic dataset (no data download needed):
kintsugi-demo # CLI entry point (after pip install)
python -m kintsugi.demo # module invocation
Expected output:
Kintsugi v0.1.0
=======================================================
1. Generating toy dataset...
Grid: 60×60, 100 genes, 2,472 tissue bins
Time: 0.02s
2. Running tessellation...
Regions: 75
Time: 0.12s
3. Diagnostic report:
── Kintsugi Tessellation Report ──────────────────────
Regions : 75
Median UMI / region : 167.0
Density CV : 0.3488
Stationarity pass : 84.0%
Composition holdout : -30.3192 nats/bin
Density holdout : -2.4633 nats/bin
Composition ΔLL : +3.3670 nats/bin
Density ΔLL : +0.4569 nats/bin
Composition dominates: False
─────────────────────────────────────────────────────
4. Determinism check:
Label hash (sha256[:16]): 3653373eec97b137
Total runtime: 0.16s
Done.
The label hash 3653373eec97b137 verifies deterministic output across
platforms.
10x Space Ranger to Kintsugi tutorial
If your data comes from 10x Genomics Space Ranger, the directory typically looks like:
sample/
├── filtered_feature_bc_matrix.h5
└── spatial/
└── tissue_positions.parquet (or .csv)
Step 1: Load
import kintsugi
grid = kintsugi.load_visium_hd_from_dir("sample/")
print(f"Grid: {grid.rows}×{grid.cols}, {grid.n_genes} genes, {grid.mask.sum()} tissue bins")
If your files are in non-standard locations:
grid = kintsugi.load_visium_hd(
"path/to/filtered_feature_bc_matrix.h5",
"path/to/tissue_positions.parquet",
)
Step 2: Tessellate
result = grid.tessellate()
With custom parameters:
result = grid.tessellate(
lag=2, # variogram lag in bins (2 bins = 4 µm at 2 µm resolution)
kappa=2.0, # stationarity tolerance
min_seed_distance=4, # minimum seed separation in bins
smooth_sigma=4.0, # Gaussian smoothing for seed detection
)
Step 3: Inspect the result
result.labels # (rows, cols) int32 — region labels, -1 outside tissue
result.residuals # (K, G) float64 — Pearson residuals
result.areas # (K,) float64 — bins per region
result.depths # (K,) float64 — total UMI per region
result.centroids # (K, 2) float64 — (row, col) centroids
result.adjacency # (K, K) sparse — spatial adjacency graph
result.trace # (rows, cols) float64 — boundary-tensor trace
result.n_regions # int — number of regions
Step 4: Diagnostic report
report = kintsugi.tessellation_report(result, grid)
print(report)
The report automatically computes:
| Metric | What it measures |
|---|---|
| Region count | Number of tessellated regions |
| Median UMI/region | Depth distribution |
| Density CV | Coefficient of variation of per-bin UMI density |
| Stationarity pass rate | Fraction passing Poisson stationarity test |
| Composition holdout LL | Held-out log-likelihood for multinomial composition |
| Density holdout LL | Held-out log-likelihood for Poisson density |
| Composition/Density ΔLL | Improvement over single-global-model null |
| Composition dominates | Warning if boundaries are driven by composition, not density |
Step 5: Export to AnnData (scverse integration)
adata = kintsugi.to_anndata(result, grid=grid, use_raw_counts=True)
# adata.X = Pearson residuals
# adata.obs = area, depth
# adata.obsm = spatial centroids
# adata.obsp = adjacency graph
# adata.layers = raw aggregated counts
adata.write("tessellation.h5ad")
The AnnData object is directly usable with Scanpy, Squidpy, and other scverse tools for clustering, visualisation, and spatial analysis.
Parameters
All parameters have fixed defaults and are documented. There is no hidden tuning.
| Parameter | Default | How to think about it |
|---|---|---|
lag |
2 |
Grid offset for directional semivariance. On a 2 µm Visium HD grid, lag=2 is a 4 µm offset. Larger values look at broader spatial variation. |
kappa |
2.0 |
Stationarity tolerance during region refinement (in SE units). Larger values allow broader regions. |
min_seed_distance |
4 |
Minimum distance between seed points in grid bins. Controls the spatial scale: larger = fewer, larger regions. |
smooth_sigma |
4.0 |
Gaussian sigma for smoothing the trace field before seed detection. Larger values favor smoother boundaries. |
For very small toy examples, use smaller min_seed_distance and
smooth_sigma values (the defaults target real Visium HD grids).
Performance
Measured on an Apple M1 Max (single thread, pure Python).
| Dataset | Grid | Genes | Tissue bins | Regions | Time | Peak memory |
|---|---|---|---|---|---|---|
| Toy (synthetic) | 60 × 60 | 100 | 2,472 | 75 | 0.3 s | 4 MB |
| Medium (synthetic) | 200 × 200 | 500 | 40,000 | ~1,900 | 0.4 s | 54 MB |
| Large (synthetic) | 500 × 500 | 1,000 | 250,000 | ~12,000 | 2.7 s | 385 MB |
Memory scales primarily with n_regions × n_genes (the dense residual
matrix). Upstream gene filtering is the main memory lever for large datasets.
Input contract
Kintsugi operates on a normalized grid:
counts: SciPy sparse matrix with shape(rows * cols, genes).rows,cols: dimensions of the 2D grid.mask: optional boolean array with shape(rows, cols);Truemarks in-tissue bins.- Matrix rows are in row-major order: row
r * cols + ccorresponds to grid bin(r, c). - Count values must be finite and non-negative.
GridData is the package container for this contract.
Non-Visium inputs
For any regular-grid data where you have occupied-bin counts and coordinates:
grid = kintsugi.build_regular_grid(
counts, # sparse (n_occupied, genes)
row_coords, # 1D array of row indices
col_coords, # 1D array of column indices
rows=R, cols=C, # grid extent
)
result = grid.tessellate()
API overview
Most users need only:
kintsugi.load_visium_hd_from_dir(...)— load from Space Ranger outputkintsugi.tessellate(...)— run the full pipelinekintsugi.tessellation_report(...)— diagnostic reportkintsugi.to_anndata(...)— export to AnnData
Lower-level functions for advanced users:
directional_semivariance— variogram estimationboundary_tensor— tensor eigendecompositionadaptive_tessellation— watershed + stationarity refinementaggregate_counts— region-level Pearson residualsbuild_spatial_graph— adjacency from labels
Reproducibility
- Deterministic: no random seeds, no stochastic algorithms. The same input always produces the same output (verified by SHA-256 hash in tests).
- No hidden tuning: all parameters are explicit and documented.
- Tested package surface: unit tests cover core algorithms, I/O, report, AnnData export, the demo dataset, and edge-case validation.
- Coverage reporting: use the pytest-cov command below to reproduce the module-level coverage table.
- CI on Python 3.10, 3.11, 3.12 via GitHub Actions.
- Docker and Singularity images for containerised reproduction.
Testing
python -m pip install -e ".[dev]"
python -m ruff check kintsugi tests examples
python -m pytest --cov=kintsugi --cov-report=term-missing
License
Kintsugi is released under the MIT License.
Citation
If you use Kintsugi in your research, please cite:
[Citation will be added upon publication]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kintsugi_st-0.1.0.tar.gz.
File metadata
- Download URL: kintsugi_st-0.1.0.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cefd1a78fd29585b9712dce5963092492f575886f8750574d362957fc3e466ee
|
|
| MD5 |
e5f6d0aabd53fd5a0b62db6291f2aef8
|
|
| BLAKE2b-256 |
675fd9c3175d662e0b8d5eb13030513d731a684ab9ad191745e76333fd027db1
|
File details
Details for the file kintsugi_st-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kintsugi_st-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40173fa723db4b6e07eaabe1a1d0e8c5dc41750757cb42606b062c53ccce769e
|
|
| MD5 |
3bfdc9816445512ccb0e344bd43683c9
|
|
| BLAKE2b-256 |
a8c46886850b7adaa7d2dca9059a182e9bb956b7dfe0e3169f9b8a8a1afcc196
|