Skip to main content

A scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner.

Project description

Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data (SUDE)

We propose a scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into this skeleton based on the constrained locally linear embedding.

This repository provides the Python version of SUDE. Version 0.2.0 keeps the public API of the original sude package while improving the runtime of the probability construction, gradient computation, and non-landmark embedding steps. The MATLAB version can be found at https://github.com/ZPGuiGroupWhu/sude. The related paper has been published in Nature Machine Intelligence: https://www.nature.com/articles/s42256-025-01112-9.

image

Project layout

The project now follows the structure of the scikit-learn-contrib/project-template:

.
|-- .github/workflows/
|-- benchmarks/
|-- doc/
|-- examples/
|-- image/
|-- sude/
|   |-- __init__.py
|   |-- _learning_utils.py
|   |-- _numba_kernels.py
|   |-- _sude.py
|   |-- _version.py
|   `-- learning.py
|-- tests/
|-- pyproject.toml
`-- README.md

Installation

Supported python versions are 3.8 and above.

This project has been uploaded to PyPI, supporting direct download and installation from pypi

pip install sude

Numba-accelerated kernels are installed by default. SUDE enables them automatically when both the unique input sample count and the landmark count are large enough.

The default thresholds are:

NUMBA_AUTO_MIN_SAMPLES = 3000
NUMBA_AUTO_MIN_LANDMARKS = 512

You can adjust them before fitting:

import sude.learning as sude_learning

sude_learning.NUMBA_AUTO_MIN_SAMPLES = 5000
sude_learning.NUMBA_AUTO_MIN_LANDMARKS = 1024

Both values must be positive integers. If either value is invalid, SUDE falls back to using numba whenever numba is installed.

Manual installation

git clone https://github.com/ZPGuiGroupWhu/SUDE-pkg.git
cd SUDE-pkg
pip install -e .

How to run

The package now exposes both a scikit-learn style estimator class and the legacy function wrapper.

Estimator interface

import numpy as np
from sude import SUDE
import time
import matplotlib.pyplot as plt

# Input data
data = np.loadtxt("benchmarks/rice.csv", delimiter=",")

# Obtain data size and true annotations
m = data.shape[1]
X = data[:, :m - 1]
ref = data[:, m - 1]

# Fit a scikit-learn style estimator
start_time = time.time()
model = SUDE(
    n_components=2,
    n_neighbors=10,
    init="pca",
    max_iter=50,
)
Y = model.fit_transform(X)
end_time = time.time()
print("Elapsed time:", end_time - start_time, 's')

plt.scatter(Y[:, 0], Y[:, 1], c=ref, cmap='tab10', s=4)
plt.show()

The estimator provides the familiar API:

model = SUDE(n_components=2, n_neighbors=10, init="spectral")
Y_train = model.fit_transform(X_train)
Y_test = model.transform(X_test)

Function interface

The original function entry point remains available for backwards compatibility:

from sude import sude

Y = sude(X, no_dims=2, k1=10, initialize="le", T_epoch=50)

Run the packaged example with:

uv run python examples/plot_sude_embedding.py

Run the test suite with:

uv run python -m unittest discover -s tests

Citation request

Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat. Mach. Intell. (2025). https://doi.org/10.1038/s42256-025-01112-9

License

SUDE is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sude-0.2.0.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sude-0.2.0-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file sude-0.2.0.tar.gz.

File metadata

  • Download URL: sude-0.2.0.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sude-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8de163622da1e779a730c0d825de8e02e5be278e403005e21fb1415c0097e716
MD5 d9f1b6536f32fb011c9a045b2a6dc15f
BLAKE2b-256 eb86866442a99a5fea04419f4b4fec5b9b143b5d522d8af5d295068e28172565

See more details on using hashes here.

File details

Details for the file sude-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sude-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sude-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7b91a3c32872e0aced83c1cfd011e344002e34a27f93963a3ff7964462242b9
MD5 51117566f37e64bce9b8e7024a1347b6
BLAKE2b-256 33ec148488e9592e5667d1b1b2330aac84fde917a1cb50d459f209f24ea3080f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page