Skip to main content

No project description provided

Project description

Parallel Delayed Cluster DP-Means

Paper

Introduction

The PDC-DP-Means package presents a highly optimized version of the DP-Means algorithm, introducing a new parallel algorithm, Parallel Delayed Cluster DP-Means (PDC-DP-Means), and a MiniBatch implementation for enhanced speed. These features cater to scalable and efficient cluster analysis where the number of clusters is unknown.

In addition to offering major speed improvements, the PDC-DP-Means algorithm supports an optional online mode for real-time data processing. Its scikit-learn-like interface is user-friendly and designed for easy integration into existing data workflows. PDC-DP-Means outperforms other nonparametric methods, establishing its efficiency and scalability in the realm of clustering algorithms.

See the paper for more details.

Installation

pip install pdc-dp-means

Quick Start

from sklearn.datasets import make_blobs
from pdc_dp_means import DPMeans

# Generate sample data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Apply DPMeans clustering
dpmeans = DPMeans(n_clusters=1,n_init=10, delta=10)  # n_init and delta parameters
dpmeans.fit(X)

# Predict the cluster for each data point
y_dpmeans = dpmeans.predict(X)

# Plotting clusters and centroids
import matplotlib.pyplot as plt

plt.scatter(X[:, 0], X[:, 1], c=y_dpmeans, s=50, cmap='viridis')
centers = dpmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5)
plt.show()

One thing to note is that we replace the \lambda parameter from the paper with delta in the code, as lambda is a reserved word in python.

Usage

Please refer to the documentation: https://pdc-dp-means.readthedocs.io/en/latest/

Paper Code

Please refer to https://github.com/BGU-CS-VIL/pdc-dp-means/tree/main/paper_code for the code used in the paper.

Citing this work

If you use this code for your work, please cite the following:

@inproceedings{dinari2022revisiting,
  title={Revisiting {DP}-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation},
  author={Dinari, Or and Freifeld, Oren},
  booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},
  year={2022}
}

License

Our code is licensed under the BDS-3-Clause license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pdc_dp_means-0.0.8-cp312-cp312-win_amd64.whl (2.6 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

pdc_dp_means-0.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

pdc_dp_means-0.0.8-cp312-cp312-macosx_11_0_arm64.whl (2.6 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

pdc_dp_means-0.0.8-cp312-cp312-macosx_10_9_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

pdc_dp_means-0.0.8-cp311-cp311-win_amd64.whl (2.6 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

pdc_dp_means-0.0.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pdc_dp_means-0.0.8-cp311-cp311-macosx_11_0_arm64.whl (2.6 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pdc_dp_means-0.0.8-cp311-cp311-macosx_10_9_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

pdc_dp_means-0.0.8-cp310-cp310-win_amd64.whl (2.6 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

pdc_dp_means-0.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pdc_dp_means-0.0.8-cp310-cp310-macosx_11_0_arm64.whl (2.6 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pdc_dp_means-0.0.8-cp310-cp310-macosx_10_9_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

pdc_dp_means-0.0.8-cp39-cp39-win_amd64.whl (2.6 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

pdc_dp_means-0.0.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pdc_dp_means-0.0.8-cp39-cp39-macosx_11_0_arm64.whl (2.6 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pdc_dp_means-0.0.8-cp39-cp39-macosx_10_9_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page