Skip to main content

A novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points.

Project description

Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity (CDC)

We propose a novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We present an interactive Demo and a brief introduction to the algorithm at https://zpguigroupwhu.github.io/CDC-Introduction-Website/, and develop a CDC toolkit at https://github.com/ZPGuiGroupWhu/ClusteringDirectionCentrality This paper has been published in Nature Communications, and more details can be seen https://www.nature.com/articles/s41467-022-33136-9.

image

Installation

Supported python versions are 3.8 and above.

This project has been uploaded to PyPI, supporting direct download and installation from pypi

pip install cdc-cluster

Manual Installation

git clone https://github.com/ZPGuiGroupWhu/CDC-pkg.git
cd CDC-pkg
pip install -e .

Usage

The CDC algorithm is refactored to be a scikit-learn compatible estimator. It provides both a class-based interface CDC and a function-based interface cdc_cluster.

Class-based Usage

from cdc_cluster import CDC
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons

# Generate sample data
X, _ = make_moons(n_samples=200, noise=0.05, random_state=42)

# Initialize and fit CDC
# n_neighbors: Number of nearest neighbors to consider (k_num)
# ratio: Ratio for determining the DCM threshold
cdc = CDC(n_neighbors=20, ratio=0.9)
cdc.fit(X)

# Get cluster labels
# Labels start from 0. Noisy samples are labeled as -1.
labels = cdc.labels_

# Plot result
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("CDC Clustering Result")
plt.show()

Function-based Usage

from cdc_cluster import cdc_cluster
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=200, centers=3, random_state=42)

# Compute clustering directly
# Returns an array of cluster labels
labels = cdc_cluster(X, n_neighbors=20, ratio=0.9)

print(f"Number of clusters: {len(set(labels)) - (1 if -1 in labels else 0)}")

Citation Request:

Peng, D., Gui, Z.*, Wang, D. et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat. Commun. 13, 5455 (2022). https://www.nature.com/articles/s41467-022-33136-9

License

This project is covered under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdc_cluster-0.2.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdc_cluster-0.2.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file cdc_cluster-0.2.1.tar.gz.

File metadata

  • Download URL: cdc_cluster-0.2.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for cdc_cluster-0.2.1.tar.gz
Algorithm Hash digest
SHA256 bcb6c584f1781618611e0b611e6f90f0b6e71b7eac41724250833775efe96235
MD5 1ebfeb46995f1d27ea80166e7d592e20
BLAKE2b-256 92e243b26a2d56360e61c6d22410c42cb1b23bbac4af5884365b9f4abdbaed23

See more details on using hashes here.

File details

Details for the file cdc_cluster-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cdc_cluster-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for cdc_cluster-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0676db55633783ac64bdb700a2420ff4bd3f36eae3bed907dc8b393b305860be
MD5 a976f7eb542f3805e411d7df4f1eb62f
BLAKE2b-256 a364438704c4118c0de025d3706bdd13e05d0be51eab15e7bcd6c00735be82c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page