Skip to main content

A novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points.

Project description

Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity (CDC)

We propose a novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We present an interactive Demo and a brief introduction to the algorithm at https://zpguigroupwhu.github.io/CDC-Introduction-Website/, and develop a CDC toolkit at https://github.com/ZPGuiGroupWhu/ClusteringDirectionCentrality This paper has been published in Nature Communications, and more details can be seen https://www.nature.com/articles/s41467-022-33136-9.

image

Installation

Supported python versions are 3.8 and above.

This project has been uploaded to PyPI, supporting direct download and installation from pypi

pip install cdc-cluster

Manual Installation

git clone https://github.com/ZPGuiGroupWhu/CDC-pkg.git
cd CDC-pkg
pip install -e .

Usage

The CDC algorithm is refactored to be a scikit-learn compatible estimator. It provides both a class-based interface CDC and a function-based interface cdc_cluster.

Class-based Usage

from cdc_cluster import CDC
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons

# Generate sample data
X, _ = make_moons(n_samples=200, noise=0.05, random_state=42)

# Initialize and fit CDC
# n_neighbors: Number of nearest neighbors to consider (k_num)
# ratio: Ratio for determining the DCM threshold
cdc = CDC(n_neighbors=20, ratio=0.9)
cdc.fit(X)

# Get cluster labels
# Labels start from 0. Noisy samples are labeled as -1.
labels = cdc.labels_

# Plot result
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("CDC Clustering Result")
plt.show()

Function-based Usage

from cdc_cluster import cdc_cluster
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=200, centers=3, random_state=42)

# Compute clustering directly
# Returns an array of cluster labels
labels = cdc_cluster(X, n_neighbors=20, ratio=0.9)

print(f"Number of clusters: {len(set(labels)) - (1 if -1 in labels else 0)}")

Citation Request:

Peng, D., Gui, Z.*, Wang, D. et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat. Commun. 13, 5455 (2022). https://www.nature.com/articles/s41467-022-33136-9

License

This project is covered under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdc_cluster-0.2.3.tar.gz (937.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdc_cluster-0.2.3-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file cdc_cluster-0.2.3.tar.gz.

File metadata

  • Download URL: cdc_cluster-0.2.3.tar.gz
  • Upload date:
  • Size: 937.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for cdc_cluster-0.2.3.tar.gz
Algorithm Hash digest
SHA256 deeda9d9d069336656b83a13af4c6830bfd9324932776621c03d987f3fd3d826
MD5 7df46b5e0456b2b6ec3814c398c00eec
BLAKE2b-256 20585dc413a5059b6780039b7015583f65b5a75e9b15cba6644d954f46434d10

See more details on using hashes here.

File details

Details for the file cdc_cluster-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for cdc_cluster-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 47c9673b288d9d5da4ec0068bcd8b45075eba5b74864b08df909b3fcbd7fb196
MD5 d7bb9d00176a8929ad9b4c406ed4cf76
BLAKE2b-256 b887ccb7dbc759b4cf777ffe84a2f050a6d278f93d4f93b910591e5bf45a9859

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page