Density-normalized clustering with minimum spanning tree construction for high-dimensional data
Project description
densitree
Density-normalized clustering with minimum spanning tree construction for high-dimensional data.
densitree implements an improved SPADE algorithm that combines density-dependent downsampling with consensus overclustering to produce accurate cluster assignments and interpretable tree structures. It works on any high-dimensional dataset with imbalanced density -- cytometry, single-cell RNA-seq, proteomics, or general point-cloud data.
Benchmark results
On the standard Levine_32dim CyTOF benchmark (104k cells, 32 markers, 14 populations):
| Method | ARI | NMI | Runtime |
|---|---|---|---|
| densitree | 0.942 | 0.930 | 4.0s |
| FlowSOM | 0.934 | 0.920 | 0.1s |
| FlowSOM (official Python) | 0.914 | 0.914 | 3.6s |
| PhenoGraph-style | 0.908 | 0.906 | 88.0s |
| KMeans | 0.569 | 0.802 | 1.3s |
Installation
pip install densitree
From source:
git clone https://github.com/fuzue/densitree.git
cd densitree
pip install -e ".[dev]"
Quick start
from densitree import SPADE
# X is any (n_samples, n_features) array or DataFrame
spade = SPADE(n_clusters=20, downsample_target=0.1, random_state=42)
spade.fit(X)
# Cluster labels for every sample
print(spade.labels_)
# Per-cluster statistics
print(spade.result_.cluster_stats_)
# Visualize the spanning tree
spade.result_.plot_tree(color_by=0, backend="matplotlib")
With a pandas DataFrame, column names are preserved:
import pandas as pd
df = pd.read_csv("data.csv")
spade = SPADE(n_clusters=30, random_state=42)
spade.fit(df[["feature_a", "feature_b", "feature_c"]])
# Stats include median_feature_a, median_feature_b, etc.
print(spade.result_.cluster_stats_)
Key features
- State-of-the-art accuracy -- consensus overclustering with mixed-linkage ensemble beats FlowSOM and PhenoGraph on standard benchmarks
- scikit-learn compatible --
fit()/fit_predict()API, works with numpy arrays and pandas DataFrames - Tree output -- minimum spanning tree reveals hierarchical relationships between clusters
- Rare population preservation -- density-dependent downsampling ensures small subgroups are not lost
- Extensible -- swap any pipeline step (density estimation, clustering) via the
BaseStepinterface - Dual visualization -- static matplotlib and interactive plotly backends
- Reproducible -- deterministic with
random_state
How it works
- Density estimation -- k-NN local density for every sample
- Consensus clustering -- multiple MiniBatchKMeans overclustering runs with ward and average linkage metaclustering, aligned via the Hungarian algorithm and combined by majority vote
- Density-dependent downsampling -- rare regions are preserved for tree construction
- MST construction -- cluster centroids connected into a minimum spanning tree
Parameters
| Parameter | Default | Description |
|---|---|---|
n_clusters |
50 | Number of output clusters |
downsample_target |
0.1 | Fraction of samples retained for tree construction |
knn |
5 | Neighbors for density estimation |
n_consensus |
10 | Overclustering runs per linkage (total = 2x). Higher = more stable. |
transform |
"arcsinh" |
"arcsinh", "log", or None |
cofactor |
150.0 | Arcsinh denominator (5.0 for CyTOF, 150.0 for flow cytometry) |
random_state |
None |
Seed for reproducibility |
Documentation
Full documentation with API reference, tutorials, and benchmark details:
pip install densitree[docs]
mkdocs serve
Running benchmarks
pip install densitree[bench]
cd benchmarks
# Synthetic dataset
python run_benchmark.py synthetic
# Real CyTOF data (downloads Levine_32dim automatically)
python run_benchmark.py Levine_32dim
License
MIT
References
- Qiu, P. et al. (2011). "Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE." Nature Biotechnology, 29(10), 886-891. doi:10.1038/nbt.1991
- Levine, J.H. et al. (2015). "Data-Driven Phenotypic Dissection of AML." Cell, 162(1), 184-197. doi:10.1016/j.cell.2015.05.047
- Samusik, N. et al. (2016). "Automated mapping of phenotype space with single-cell data." Nature Methods, 13(6), 493-496. doi:10.1038/nmeth.3863
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file densitree-0.1.0.tar.gz.
File metadata
- Download URL: densitree-0.1.0.tar.gz
- Upload date:
- Size: 251.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfcca7adf6fad30907e06f02a14220bccd3da52043a20d977a7e83eafd3faf56
|
|
| MD5 |
6aa963e3b7ad4f4d4f78b68e836ecb31
|
|
| BLAKE2b-256 |
ce5c02fc6d4aed4ff7fa61110680d3b37394fedeb2ed169d2d3bae9880848349
|
Provenance
The following attestation bundles were made for densitree-0.1.0.tar.gz:
Publisher:
release.yml on fuzue/densitree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
densitree-0.1.0.tar.gz -
Subject digest:
bfcca7adf6fad30907e06f02a14220bccd3da52043a20d977a7e83eafd3faf56 - Sigstore transparency entry: 1270651559
- Sigstore integration time:
-
Permalink:
fuzue/densitree@5e18e4929db73f0e43172f0f6f62dc0930cb8153 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/fuzue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e18e4929db73f0e43172f0f6f62dc0930cb8153 -
Trigger Event:
push
-
Statement type:
File details
Details for the file densitree-0.1.0-py3-none-any.whl.
File metadata
- Download URL: densitree-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6ae189c50bb794f9a0861ecbb600cac5a6754589047257e6705793407c1f03a
|
|
| MD5 |
b6b9b88aae57e50fb7eff0e81dbeab64
|
|
| BLAKE2b-256 |
608ddecd379f47db399979311c5719ca32e292e3778f441efe91a160c6a7f90c
|
Provenance
The following attestation bundles were made for densitree-0.1.0-py3-none-any.whl:
Publisher:
release.yml on fuzue/densitree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
densitree-0.1.0-py3-none-any.whl -
Subject digest:
e6ae189c50bb794f9a0861ecbb600cac5a6754589047257e6705793407c1f03a - Sigstore transparency entry: 1270651567
- Sigstore integration time:
-
Permalink:
fuzue/densitree@5e18e4929db73f0e43172f0f6f62dc0930cb8153 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/fuzue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e18e4929db73f0e43172f0f6f62dc0930cb8153 -
Trigger Event:
push
-
Statement type: