Skip to main content

An Implementation of Component-wise Peak Finding Clustering Method

Project description

CPFcluster

An implementation of the Component-wise Peak-Finding (CPF) clustering method, presented in 'Scalable and Adaptable Density-Based Clustering using Level Set and Mode-Seeking Methods'.

Dependencies

CPFcluster supports Python 3, with numpy, scipy, itertools, multiprocessing and scikit-learn. These should be linked with a BLAS implementation (e.g., OpenBLAS, ATLAS, Intel MKL).

Installation

CPFcluster is available on PyPI, the Python Package Index.

$ pip install CPFcluster

How To Use

To use CPFcluster, first import the CPFcluster module.

    from CPFcluster import CPFcluster

Clustering a Dataset

A CPFcluster object is constructed using the fit method, which returns a clustering of a dataset.

    CPF = CPFcluster(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    CPF.fit(X)

CPFcluster takes 6 arguments:

  • k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
  • rho (Defaults to 0.4) Parameter used in threshold for center selection.
  • alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
  • n_jobs (Defaults to 1) Number of cores for program to execute on.
  • remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
  • cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFcluster object is then fit to a dataset:

  • X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.

The result object further contains:

  • CCmat An n-by-n sparse matrix representation of the k-NN graph.
  • components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
  • ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
  • peaks A vector containing the index of the peaks selected as cluster centers.
  • memberships The final cluster labelings.

CPFmatch for Multi-Image Matching

CPFmatch is the modified version of CPF applicable for the multi-image matching problem. To use CPFmatch, first import the CPFmatch module.

    from CPFcluster import CPFmatch

Clustering a Dataset

A CPFmatch object is constructed using the fit method, which returns a clustering of a dataset.

    match = CPFmatch(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    match.fit(X, img_label)

CPFmatch takes the same 6 arguments as CPFcluster:

  • k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
  • rho (Defaults to 0.4) Parameter used in threshold for center selection.
  • alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
  • n_jobs (Defaults to 1) Number of cores for program to execute on.
  • remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
  • cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFmatch object is then fit to a dataset with the label of the images included also:

  • X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
  • img_label An n-by-1 numpy.ndarray with the image label for each feature. The rows correspond to n keypoints, and no two keypoints from the same image will be clustered together.

The result object further contains as before:

  • CCmat An n-by-n sparse matrix representation of the k-NN graph.
  • components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
  • ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
  • peaks A vector containing the index of the peaks selected as cluster centers.
  • memberships The final cluster labelings.

Tests

CPFcluster

CPFcluster has an MIT License.

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CPFcluster-2.0.tar.gz (6.5 kB view details)

Uploaded Source

File details

Details for the file CPFcluster-2.0.tar.gz.

File metadata

  • Download URL: CPFcluster-2.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for CPFcluster-2.0.tar.gz
Algorithm Hash digest
SHA256 0ce152cc040cab8cd3da789ce530748c67dab32bd5d82b82f512a1b7b0351550
MD5 9463bf23765a2d91f075fb6278cfe1af
BLAKE2b-256 6032590663ce4b0a91bc1d8ecd4893f1fe8cb7ae3d27ce99d58eb1d669fc3c26

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page