Skip to main content

A scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner.

Project description

Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data (SUDE)

We propose a scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into this skeleton based on the constrained locally linear embedding.

This repository provides the Python version of SUDE. Version 0.2.1 keeps the public API of the original sude package while improving the runtime of the probability construction, gradient computation, and non-landmark embedding steps. The MATLAB version can be found at https://github.com/ZPGuiGroupWhu/sude. The related paper has been published in Nature Machine Intelligence: https://www.nature.com/articles/s42256-025-01112-9.

image

Project layout

The project now follows the structure of the scikit-learn-contrib/project-template:

.
|-- .github/workflows/
|-- benchmarks/
|-- doc/
|-- examples/
|-- image/
|-- sude/
|   |-- __init__.py
|   |-- _learning_utils.py
|   |-- _numba_kernels.py
|   |-- _sude.py
|   |-- _version.py
|   `-- learning.py
|-- tests/
|-- pyproject.toml
`-- README.md

Installation

Supported python versions are 3.8 and above.

This project has been uploaded to PyPI, supporting direct download and installation from pypi

pip install sude

Numba-accelerated kernels are installed by default. SUDE enables them automatically when both the unique input sample count and the landmark count are large enough.

The default thresholds are:

NUMBA_AUTO_MIN_SAMPLES = 3000
NUMBA_AUTO_MIN_LANDMARKS = 512

You can adjust them before fitting:

import sude.learning as sude_learning

sude_learning.NUMBA_AUTO_MIN_SAMPLES = 5000
sude_learning.NUMBA_AUTO_MIN_LANDMARKS = 1024

Both values must be positive integers. If either value is invalid, SUDE falls back to using numba whenever numba is installed.

Manual installation

git clone https://github.com/ZPGuiGroupWhu/SUDE-pkg.git
cd SUDE-pkg
pip install -e .

How to run

The package now exposes both a scikit-learn style estimator class and a function wrapper with matching parameter names.

Estimator interface

import numpy as np
from sude import SUDE
import time
import matplotlib.pyplot as plt

# Input data
data = np.loadtxt("benchmarks/rice.csv", delimiter=",")

# Obtain data size and true annotations
m = data.shape[1]
X = data[:, :m - 1]
ref = data[:, m - 1]

# Fit a scikit-learn style estimator
start_time = time.time()
model = SUDE(
    n_components=2,
    n_neighbors=10,
    init="pca",
    max_iter=50,
)
Y = model.fit_transform(X)
end_time = time.time()
print("Elapsed time:", end_time - start_time, 's')

plt.scatter(Y[:, 0], Y[:, 1], c=ref, cmap='tab10', s=4)
plt.show()

The estimator provides the familiar API:

model = SUDE(n_components=2, n_neighbors=10, init="le")
Y_train = model.fit_transform(X_train)
Y_test = model.transform(X_test)

Function interface

The function entry point uses the same sklearn-style parameter names as the estimator:

from sude import sude

Y = sude(X, n_components=2, n_neighbors=10, init="le", max_iter=50)

For readers comparing with the paper or original function interface, n_components corresponds to no_dims, n_neighbors corresponds to k1, init corresponds to initialize, and max_iter corresponds to T_epoch.

Run the packaged example with:

uv run python examples/plot_sude_embedding.py

Run the test suite with:

uv run python -m unittest discover -s tests

Citation request

Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat. Mach. Intell. (2025). https://doi.org/10.1038/s42256-025-01112-9

License

SUDE is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sude-0.2.1.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sude-0.2.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file sude-0.2.1.tar.gz.

File metadata

  • Download URL: sude-0.2.1.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sude-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b7337472c72be1b62a5086bb186b09d4b7aa8357e4b9c5abaae4f8337b7a513b
MD5 d2f54e85002fb6f6413c0cac2a37e25e
BLAKE2b-256 66996b63d321474f027fa6ef119f899beeb6c30716c667c258dea50008100c50

See more details on using hashes here.

File details

Details for the file sude-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sude-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sude-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ce3b07c90d12dab133deb49f5b6d22b80ba407f73ab7199ffaf6f3be3e5ea34
MD5 b9910da51e99484a722b9bd570f1e715
BLAKE2b-256 726d6d7693458c08a64efad71fa460a0cbac6614260f040b98b7bf3174228073

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page