A scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner.
Project description
Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data (SUDE)
We propose a scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into this skeleton based on the constrained locally linear embedding.
This repository provides the Python version of SUDE. Version 0.2.0 keeps the
public API of the original sude package while improving the runtime of the
probability construction, gradient computation, and non-landmark embedding
steps. The MATLAB version can be found at https://github.com/ZPGuiGroupWhu/sude.
The related paper has been published in Nature Machine Intelligence:
https://www.nature.com/articles/s42256-025-01112-9.
Project layout
The project now follows the structure of the
scikit-learn-contrib/project-template:
.
|-- .github/workflows/
|-- benchmarks/
|-- doc/
|-- examples/
|-- image/
|-- sude/
| |-- __init__.py
| |-- _learning_utils.py
| |-- _numba_kernels.py
| |-- _sude.py
| |-- _version.py
| `-- learning.py
|-- tests/
|-- pyproject.toml
`-- README.md
Installation
Supported python versions are 3.8 and above.
This project has been uploaded to PyPI, supporting direct download and installation from pypi
pip install sude
Numba-accelerated kernels are installed by default. SUDE enables them automatically when both the unique input sample count and the landmark count are large enough.
The default thresholds are:
NUMBA_AUTO_MIN_SAMPLES = 3000
NUMBA_AUTO_MIN_LANDMARKS = 512
You can adjust them before fitting:
import sude.learning as sude_learning
sude_learning.NUMBA_AUTO_MIN_SAMPLES = 5000
sude_learning.NUMBA_AUTO_MIN_LANDMARKS = 1024
Both values must be positive integers. If either value is invalid, SUDE falls back to using numba whenever numba is installed.
Manual installation
git clone https://github.com/ZPGuiGroupWhu/SUDE-pkg.git
cd SUDE-pkg
pip install -e .
How to run
The package now exposes both a scikit-learn style estimator class and the legacy function wrapper.
Estimator interface
import numpy as np
from sude import SUDE
import time
import matplotlib.pyplot as plt
# Input data
data = np.loadtxt("benchmarks/rice.csv", delimiter=",")
# Obtain data size and true annotations
m = data.shape[1]
X = data[:, :m - 1]
ref = data[:, m - 1]
# Fit a scikit-learn style estimator
start_time = time.time()
model = SUDE(
n_components=2,
n_neighbors=10,
init="pca",
max_iter=50,
)
Y = model.fit_transform(X)
end_time = time.time()
print("Elapsed time:", end_time - start_time, 's')
plt.scatter(Y[:, 0], Y[:, 1], c=ref, cmap='tab10', s=4)
plt.show()
The estimator provides the familiar API:
model = SUDE(n_components=2, n_neighbors=10, init="spectral")
Y_train = model.fit_transform(X_train)
Y_test = model.transform(X_test)
Function interface
The original function entry point remains available for backwards compatibility:
from sude import sude
Y = sude(X, no_dims=2, k1=10, initialize="le", T_epoch=50)
Run the packaged example with:
uv run python examples/plot_sude_embedding.py
Run the test suite with:
uv run python -m unittest discover -s tests
Citation request
Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat. Mach. Intell. (2025). https://doi.org/10.1038/s42256-025-01112-9
License
SUDE is released under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sude-0.2.0.tar.gz.
File metadata
- Download URL: sude-0.2.0.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8de163622da1e779a730c0d825de8e02e5be278e403005e21fb1415c0097e716
|
|
| MD5 |
d9f1b6536f32fb011c9a045b2a6dc15f
|
|
| BLAKE2b-256 |
eb86866442a99a5fea04419f4b4fec5b9b143b5d522d8af5d295068e28172565
|
File details
Details for the file sude-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sude-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7b91a3c32872e0aced83c1cfd011e344002e34a27f93963a3ff7964462242b9
|
|
| MD5 |
51117566f37e64bce9b8e7024a1347b6
|
|
| BLAKE2b-256 |
33ec148488e9592e5667d1b1b2330aac84fde917a1cb50d459f209f24ea3080f
|