Skip to main content

Fast Density Clustering in low-dimension

Project description

Fast density clustering (fdc)

Python package for clustering low-dimensional data using kernel density maps and density graph. Examples for gaussian mixtures and some benchmarks are provided. Our algorithm solves multiscale problems (multiple variances/densities and population sizes) and works for non-convex clusters. It uses cross-validation and is regularized by two main global parameters : a neighborhood size and a noise threshold measure. The later detects spurious cluster centers while the former guarantees that only local information is used to infer cluster centers.

The underlying code is based on fast KD-trees for nearest-neighbor searches. For low-dimensional spaces, the algorithm has a O(n log n), where n is the size of the dataset. Is also has a memory complexity of O(n).

Installing

I suggest you install the code using pip from an Anaconda Python 3 environment. From that environment:

pip install fdc

That's it ! You can now import the package fdc from your Python scripts. Check out the examples in the file example and see if you can run the scripts provided.

Examples and comparison with other methods

Check out the example for gaussian mixtures (example.py). You should be able to run it directly. It should produce a plot similar to this: alt tag

In another example (example2.py), the algorithm is benchmarked against some sklearn datasets (note that the same parameters are used across all datasets). This is to be compared with other clustering methods easily accesible from sklearn.

alt tag

Citation

If you use this code in a scientific publication, I would appreciate citation/reference to this repository. Also, for further references on clustering and machine learning check out our machine learning review:

@article{mehta2018high,
  title={A high-bias, low-variance introduction to Machine Learning for physicists},
  author={Mehta, Pankaj and Bukov, Marin and Wang, Ching-Hao and Day, Alexandre GR and Richardson, Clint and Fisher, Charles K and Schwab, David J},
  journal={arXiv preprint arXiv:1803.08823},
  year={2018}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fdc-1.15.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

fdc-1.15-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file fdc-1.15.tar.gz.

File metadata

  • Download URL: fdc-1.15.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.4

File hashes

Hashes for fdc-1.15.tar.gz
Algorithm Hash digest
SHA256 285f261d766dd7160a7d0a5695f3e64f364bff5a5cf0135a8df60b5d2af10598
MD5 0fde87b6fff61daf7ea090c4c24993e4
BLAKE2b-256 9ee52e67c0e7bf7c1547ff4a337c71f254e2c4c106195614230659b04763f876

See more details on using hashes here.

File details

Details for the file fdc-1.15-py3-none-any.whl.

File metadata

  • Download URL: fdc-1.15-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.4

File hashes

Hashes for fdc-1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 3df8325fc732ad9db9b12abf607c987bf6a6af0d5b1fe0448c2d9d192bcb2a2d
MD5 cba9a65d2d5c6c8112016e6ddea20072
BLAKE2b-256 aee327f3d0e0a536d94425aea1116eb7615eb961398cc1aa3f2c84ed063aedfc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page