Enhancing unsupervised learning with geometry-density interactions via decomposition of data into geometric core-periphery layers and subsequent clustering

Project description

cplearn

cplearn is a Python toolkit for unsupervised learning on data with underlying core–periphery-like structures.
The package includes:

CoreSPECT – identifies most-to-least separable layers in the data w.r.t clustering, along with a clustering.
CoreMAP – Visualization w.r.t. underlying layered structure as derived by corespect using a novel anchor-based optimization.
Visualizer – interactive plots for visualizing core structure and subsequent layers

Installation

From PyPI:

pip install cplearn

Quickstart

#Generate mixture model based data for self-contained example.

import numpy as np

def generate_gmm_highdim(n=1000, d=10, gamma=1.0, seed=42):
    """
    Generate a 2-cluster Gaussian Mixture Model (GMM) in d dimensions.

    Parameters
    ----------
    n : int
        Total number of samples.
    d : int
        Dimensionality of the data (default 10).
    gamma : float
        Cluster separation factor. Lower gamma = harder to separate. [0.5=> hard]
    seed : int
        Random seed for reproducibility.

    Returns
    -------
    X : (n, d) ndarray
        Generated data points.
    labels : (n,) ndarray
        True cluster labels (0 or 1).
    means : list of ndarray
        The two cluster means.
    """
    np.random.seed(seed)
    pi = [0.5, 0.5]  # equal mixture weights

    # Define means separated along the diagonal direction scaled by gamma
    base_sep = 1  # base distance between clusters
    mu1 = np.zeros(d)
    mu2 = np.ones(d) * base_sep * gamma

    # Slightly correlated covariance matrices
    A = np.eye(d)
    A += 0.2 * np.triu(np.ones((d, d)), 1)  # introduce mild correlation
    cov1 = np.dot(A, A.T) / d
    cov2 = cov1.copy()

    # Assign cluster labels
    labels = np.random.choice([0, 1], size=n, p=pi)

    # Sample from corresponding Gaussians
    X = np.zeros((n, d))
    X[labels == 0] = np.random.multivariate_normal(mu1, cov1, size=(labels == 0).sum())
    X[labels == 1] = np.random.multivariate_normal(mu2, cov2, size=(labels == 1).sum())

    return X, labels, [mu1, mu2]

#Generate data.
gamma=0.5
X, labels, means = generate_gmm_highdim(n=1000, d=10, gamma=gamma)

#---- The algorithm starts from here ----#


#Load CoreSPECT and configuration module
from cplearn.corespect import CorespectModel
from cplearn.corespect.config import CoreSpectConfig

#Initial parameters.
cfg = CoreSpectConfig(
    q=20,               #Determines neighborhood size for the underlying q-NN graph 
    r=10,               #Neighborhood radius parameter for ascending random walk with FlowRank
    core_frac=0.2,      #Fraction of points in the top-layer
    densify=False,      #Densifying different parts of the data to reduce fragmentation
    granularity=0.5,    #Higher granularity finds more local cores but can lead to missing out on weaker clusters.
    resolution=0.5      #Resolution for clustering with Leiden (more clustering methods will be added later)
).configure()

'''
For (q,r), two recommended choices are (40,20) and (20,10). 
(20,10) will lead to more fragmentation compared to (40,20).
'''

# Run **CoreSPECT**
model = CorespectModel(X, **cfg.unpack()).run(fine_grained=True,propagate=True)

'''
Main components:
model.layers_: Containts a list of lists. Each list consists of a subset of indices (between 0 and n-1, where n:= X.shape[0])
The first list corresponds to the indices that form the cores, the subsequent lists contain the outer layers.

model.labels_: n-sized integer array. 
    If propagate==False: Contains clustering label for the core (model.layers_[0]) indices, -1 in other places.
    If propagate==True:  Contains clustering label for all the points.

'''

#Visualizing the outcomes:

#Step 1: Generate UMAP skeleton.
import umap
reducer=umap.UMAP()
X_umap=reducer.fit_transform(X)


#Step 2: Initiate the **coremap** module.
from cplearn.coremap import Coremap
cmap=Coremap(model,global_umap=X_umap,fast_view=True)

'''
If fast_view= True, then we just use the UMAP skeleton, and then later show the visualization in a layer-wise manner.
If fast_view==False, we generate our own layer-wise visualization with the coremap algorithm.
'''


#Step 3: Layer-wise visualization (you can use your own labels instead of model.labels_)
from cplearn.coremap.vizualizer import visualize_coremap
fig=visualize_coremap(cmap,model.labels_, use_webgl=True)
fig.show()

References

If you use this package in your research, please cite:

CoreSPECT
Chandra Sekhar Mukherjee, Joonyoung Bae, and Jiapeng Zhang.
CoreSPECT: Enhancing Clustering Algorithms via an Interplay of Density and Geometry. *link: https://arxiv.org/abs/2507.08243 *
CoreMAP – paper coming soon

Other related work

Balanced Ranking
Chandra Sekhar Mukherjee and Jiapeng Zhang.
Balanced Ranking with Relative Centrality: A Multi-Core Periphery Perspective.
ICLR 2025.

License

This package is licensed under the BSD 3-Clause License.
See the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.2.1

Oct 27, 2025

0.2.0

Oct 27, 2025

0.1.1

Sep 12, 2025

0.1.0

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cplearn-0.2.1.tar.gz (30.2 kB view details)

Uploaded Oct 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cplearn-0.2.1-py3-none-any.whl (35.6 kB view details)

Uploaded Oct 27, 2025 Python 3

File details

Details for the file cplearn-0.2.1.tar.gz.

File metadata

Download URL: cplearn-0.2.1.tar.gz
Upload date: Oct 27, 2025
Size: 30.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cplearn-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`6f69d1eecb5b85edf82e17a000676d609003b89385fd4ad5cbb6f9d988de6ec5`
MD5	`664b0381e3ead2389d47ab4b978521e5`
BLAKE2b-256	`c31e9aad1705f02736bc0b8b0562675aa2524ce35e7b254503d39355a9d15165`

See more details on using hashes here.

File details

Details for the file cplearn-0.2.1-py3-none-any.whl.

File metadata

Download URL: cplearn-0.2.1-py3-none-any.whl
Upload date: Oct 27, 2025
Size: 35.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cplearn-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36ee4d553df36f975fbf6e91f3c5d8941608c516821449b20d3a8e5b00b1997c`
MD5	`b0f23f5ec946d9848eefebede1ebc744`
BLAKE2b-256	`55980eb6072ddddcae6a82ceef83cf0c3f8fd348f055306ae4f3c9aa680f2f29`

See more details on using hashes here.

cplearn 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

cplearn

Installation

Quickstart

References

Other related work

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes