Skip to main content

A novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points.

Project description

Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity (CDC)

We propose a novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We present an interactive Demo and a brief introduction to the algorithm at https://zpguigroupwhu.github.io/CDC-Introduction-Website/, and develop a CDC toolkit at https://github.com/ZPGuiGroupWhu/ClusteringDirectionCentrality This paper has been published in Nature Communications, and more details can be seen https://www.nature.com/articles/s41467-022-33136-9.

image

Installation

Supported python versions are 3.8 and above.

This project has been uploaded to PyPI, supporting direct download and installation from pypi

pip install cdc-cluster

Manual Installation

git clone https://github.com/ZPGuiGroupWhu/CDC-pkg.git
cd CDC-pkg
pip install -e .

Usage

The CDC algorithm is refactored to be a scikit-learn compatible estimator. It provides both a class-based interface CDC and a function-based interface cdc_cluster.

Class-based Usage

from cdc import CDC
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons

# Generate sample data
X, _ = make_moons(n_samples=200, noise=0.05, random_state=42)

# Initialize and fit CDC
# n_neighbors: Number of nearest neighbors to consider (k_num)
# ratio: Ratio for determining the DCM threshold
cdc = CDC(n_neighbors=20, ratio=0.9)
cdc.fit(X)

# Get cluster labels
# Labels start from 0. Noisy samples are labeled as -1.
labels = cdc.labels_

# Plot result
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("CDC Clustering Result")
plt.show()

Function-based Usage

from cdc import cdc_cluster
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=200, centers=3, random_state=42)

# Compute clustering directly
# Returns an array of cluster labels
labels = cdc_cluster(X, n_neighbors=20, ratio=0.9)

print(f"Number of clusters: {len(set(labels)) - (1 if -1 in labels else 0)}")

Citation Request:

Peng, D., Gui, Z.*, Wang, D. et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat. Commun. 13, 5455 (2022). https://www.nature.com/articles/s41467-022-33136-9

License

This project is covered under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdc_cluster-0.2.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdc_cluster-0.2.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file cdc_cluster-0.2.0.tar.gz.

File metadata

  • Download URL: cdc_cluster-0.2.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for cdc_cluster-0.2.0.tar.gz
Algorithm Hash digest
SHA256 35da45a3b87603e3b0b53257b9b2378b74d242276ef9e1e17c5110691ba049dd
MD5 35e385c26e46459928b8d45510302594
BLAKE2b-256 0b6a682aacbe33bf44325c32336321f69fe9a517f26623415cc131512c9841fc

See more details on using hashes here.

File details

Details for the file cdc_cluster-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cdc_cluster-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for cdc_cluster-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 85216ac494a73cbe94952af80c40f4583f13761a2a443f68173cd99e97bb6e13
MD5 f43697c915e260aec186a9ef7a622c80
BLAKE2b-256 00f6571510eeacbb334c4849343145a10015a94f7d4bc271c4ee2233d2e23d0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page