Skip to main content

implementation of clustering by fast search and find density peak. Implementation is based on numpy, cuda and opencl

Project description

Image Algorithm for General Purpose

Image Algorithm is a clustering algorithm based fast search and find of density peaks. Comparing with other popular clustering methods, such as DBSCAN, one of the most prominent advantages of Image Algorithm is being highly parallelizable. This repository is an implementation of Image Algorithm for general purpose, supporting strong and easy GPU acceleration.

For now, the implementation includes three backends: numpy, CUDA and OpenCL.

backend dependency Support Platform Support Device
numpy None Mac/Linux/Windows CPU
CUDA pycuda Linux Only NVIDIA GPU
OpenCL pyopencl Mac NVIDIA/AMD/Intel GPU, multi-core CPU

For three backends, two kinds of data structure can be taken in. Flat list and KDBin. KDBins is based on hash map of spatial bins of points and nearest neighbors. Strong acceleration in density calculation is observed with KDBin.

backend data structure for rho data structure for rhorank and nh
numpy List/KDBin List/KDBin
CUDA List/KDBin List
OpenCL List/KDBin List

It has been tested that all three backends give the identical clustering results. Therefore users can feel free to choose whichever faster and easier for their purposes. Concerning speed performace, acceleration from CUDA/OpenCL may give an up to x20 speed up from CPU when dealing with more than a few thousands of data points. A preliminary speed test of three backends can be found here.

Installation

pip install ImageAlgoKD

Regarding dependency, no dependency is required for numpy backend. And it usually does a good job dealing with small dataset and needs no extra packages. However, for users wanting to use GPU acceleration with either CUDA or OpenCL backend, extra dependency is required.

# if want to use opencl backend
pip install pyopencl
# if want to use CUDA backend
pip install pycuda

Quick Start

The primary usage of the module is the following First of all, import ImageAlgo class for K-Dimension

from ImageAlgoKD import *

Declare an instance of ImageAlgoKD with your algorithm parameters. Then give it the input data points.

ia = ImageAlgoKD(MAXDISTANCE=20, KERNEL_R=1.0)
ia.setInputsPoints(Points(np.genfromtxt("../data/basic.csv",delimiter=',')))

Then run the clustering over input data points.

ia.run("numpy")
# ia.run("opencl") or ia.run("cuda") if want run in parallel

In the end, the clustering result can be access by

ia.points.clusterID

Algorithm Parameters

Parameters Comments Default Value
MAXDISTANCE the separation distance of the point with highest density. 10.0
KERNEL_R 'd_c' in density calculation 1.0
KERNEL_R_NORM 'd_0' in density calculation 1.0
KERNEL_R_POWER 'k' in density calculation. 0.0
DECISION_RHO_KAPPA the ratio of density threshold of seeds to the highest density 4.0
DECISION_NHD the separation threshold of seeds 1.0
CONTINUITY_NHD the separation threshold of continuous clusters 1.0

where density is defined as

Examples

I. Basic

Perform IA clustering on 1000 toy 2D points, sampled from two Gaussian Distrituion and noise. The toy data is in data/basic.csv, while the corresponding jupyter notebook can be found here in examples/.

II. MNIST

Perform IA clustering on 1000 MNIST 28x28 dimension points. The MNIST data is in data/mnist.csv, while the corresponding jupyter notebook can be found here in examples/.

III. HGCal

This is an event of 10 Pions with 300 GeV energy in CMS HGCal. A 3D interactive visualization can be found here. In addition, for event with pile up, here is an 300GeV pion with PU200 event. A PU200 event typically includes about 200k HGVCal reconstructed detector hits, which is input into IA clustering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ImageAlgoKD-0.0.2.tar.gz (12.8 kB view hashes)

Uploaded Source

Built Distribution

ImageAlgoKD-0.0.2-py3-none-any.whl (13.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page