Python port of the R DDRTree package (tracks R DDRTree 0.1.6, CRAN 2026-02-24) — learns principal graphs via reversed graph embedding.
Project description
DDRTree-python
AI-assisted Python port of the DDRTree R package — Discriminative Dimensionality Reduction via learning a Tree — from the KDD'15 paper by Qi Mao, Li Wang, Steve Goodison and Yijun Sun.
Tracks the R CRAN release DDRTree 0.1.6 (2026-02-24).
DDRTree simultaneously:
- Reduces high-dimensional data to a low-dimensional latent space
Z, - Learns an explicit smooth principal tree graph embedded in that space, and
- Obtains a soft clustering of points onto the tree nodes.
It is the dimensionality-reduction backbone used by Monocle 2 for single-cell pseudotime / branching-trajectory inference, but is a general-purpose algorithm for any data with a tree-like intrinsic structure.
Installation
# From PyPI (distribution: ddrtree-python, import: ddrtree)
pip install ddrtree-python # NumPy backend only
pip install ddrtree-python[torch] # + PyTorch backend (CPU / CUDA)
For local development:
git clone https://github.com/Bio-Babel/DDRTree-python.git
cd DDRTree-python
pip install -e ".[dev]"
The core depends on numpy, scipy, and scikit-learn. The optional
torch extra enables the GPU-friendly backend — no C/C++ extensions are
built either way.
Quick start
import numpy as np
from ddrtree import DDRTree
# X is a D x N matrix (features x samples), matching the R convention.
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 200))
res = DDRTree(X, dimensions=2, max_iter=20,
sigma=1e-3, lambda_=None, ncenter=50,
gamma=10.0, tol=1e-3, verbose=False)
res.Z # 2 x 200 reduced-dimension embedding
res.Y # 2 x 50 principal-graph node coordinates
res.W # 10 x 2 orthogonal projection basis
res.stree # N x N scipy.sparse MST weights (first K x K block populated)
res.objective_vals # objective at each iteration
Backends
DDRTree dispatches to one of two computational backends via the
backend argument. The public function signature is otherwise unchanged.
Backend selection is always explicit — there is no auto-detection.
backend |
Executes on | Typical use |
|---|---|---|
"numpy" |
CPU (NumPy) | Default. Reference path, aligned with R. |
"torch" |
CPU / CUDA | GPU acceleration (Borůvka MST, fast BLAS). |
# GPU path — requires torch with CUDA
res = DDRTree(X, ncenter=500, backend="torch", device="cuda")
# Half precision (torch backend only). Memory ½, throughput ~1.5–2× on
# CUDA, ~1e-3 relative drift in the converged embedding.
res = DDRTree(X, ncenter=500, backend="torch", device="cuda", dtype="float32")
Torch backend runs a pure-torch parallel Borůvka (O(log K) rounds)
for the MST step — no host round trip per iteration. Explicit
mst_algorithm="prim" or "kruskal" routes MST through NumPy / SciPy
for strict parity testing against the R gold standard; see
tests/test_boruvka_integration.py.
Known differences between backends
- PCA initialisation. NumPy uses
scipy.sparse.linalg.svds(iterative, Lanczos, matches R'sirlba). Torch uses a directtorch.linalg.svdfor the truncated branch — identical subspace, small per-iteration numerical drift. - K-means. Both backends call
sklearn.cluster.KMeanson CPU (small K, negligible overhead). No GPU K-means yet. - Cholesky fallback. When the
tmp_Msystem drifts non-PSD both backends fall back to LU and emit aRuntimeWarning. The Y-update Cholesky is strict in both backends — the(λ/γ)L + Γsystem is PSD by construction, any failure there is surfaced rather than masked.
Numerical parity with the R package
The test suite runs the same inputs through the R DDRTree package and compares
the results, for both backends. Eigen-vector sign flips (inherent to
eigen-decompositions) are handled in tests. See
tests/scripts/generate_gold_standard.R, tests/test_ddrtree.py (NumPy)
and tests/test_backend_torch_gold.py (Torch).
Reference
Qi Mao, Li Wang, Steve Goodison, Yijun Sun. Dimensionality Reduction via Graph Structure Learning. KDD'15. https://dl.acm.org/doi/10.1145/2783258.2783309
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ddrtree_python-0.1.6.tar.gz.
File metadata
- Download URL: ddrtree_python-0.1.6.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e41f66a85aebabe2bd5e08f8b97e4acebdf1c71cd9815c72f87b06d7b53b1e75
|
|
| MD5 |
66163f2a05c08650cc4366a76c59a5cc
|
|
| BLAKE2b-256 |
ea276b23681f6374c3148c65fb1a1c8d8c96c5e4b5a442b06f5bf9b01c10c292
|
File details
Details for the file ddrtree_python-0.1.6-py3-none-any.whl.
File metadata
- Download URL: ddrtree_python-0.1.6-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9f84b5fb6016c7ecb1074b4e24a2577c41bbeda4082d3c3243e333bc082fc9b
|
|
| MD5 |
b252f32952ce3995699d87506e0e24ad
|
|
| BLAKE2b-256 |
281d2c4b2614f230837f3edf95d63192cabc9fbed8809cef7b76b6ea02ab57c0
|