Project Description
## Overview

## Installation and Requirements

## Usage and Example

## References

Release History
## Release History

Download Files
## Download Files

A fast and memory-efficient implementation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

It is especially suited for multiple rounds of down-sampling and clustering from a joint dataset: after an initial overhead O(N log(N)), each subsequent run of clustering will have O(N) time complexity.

As illustrated by a doctest embedded in the present module’s docstring, on a dataset of 15,000 samples and 47 features, on a Asus Zenbook laptop with 8 GiB of RAM and an Intel Core M processor, DBSCAN_multiplex processes 50 rounds of sub-sampling and clustering in about 4 minutes, whereas Scikit-learn’s implementation of DBSCAN performs the same task in more than 28 minutes.

Such a test can be performed quite conveniently on your machine: simply
entering `python DBSCAN_multiplex`in your terminal will prompt a
doctest to start comparing the performance of the two afore-mentioned
implementations of DBSCAN.

This 7-fold gain in performance proved critical to the statistical learning application that prompted the design of this algorithm.

DBSCAN_multiplex requires a machine running any member of the Unix-like family of operaating systems, Python 2.7 along with the following packages and a few modules from the Standard Python Library:

- NumPy >= 1.9
- PyTables
- scikit-learn

You can install DBSCAN_multiplex from the official Python Package Index (PyPI) as follows:

- open a terminal window;
- type in
`pip install DBSCAN_multiplex`.

The command listed above should automatically install or upgrade any missing or outdated dependency among those listed at the beginning of this section.

See the docstrings associated to each function of the DBSCAN_multiplex
module for more information; in a Python interpreter console, they can
be viewed by calling the built-in help system, e.g.,
`help(DBSCAN_multiplex.load)`.

The following few lines show how DBSCAN_multiplex can be used for clustering 50 randomly selected subsamples out of a common Gaussian distributed dataset. This situation arises in consensus clustering where one might want to obtain and then combine multiple vectors of cluster labels.

>>> import numpy as np >>> import DBSCAN_multiplex as DB >>> data = np.random.randn(15000, 7) >>> N_iterations = 50 >>> N_sub = 9 * data.shape[0] / 10 >>> subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int) >>> for i in xrange(N_iterations): subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False) >>> eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)

- Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996) “A density-based algorithm for discovering clusters in large spatial databases with noise”. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226-231.
- Kriegel, H.-P., Kroeger, P., Sander, J. and Zimek, A. (2011) “Density-based Clustering”. WIREs Data Mining and Knowledge Discovery 1 (3): 231-240.

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help | Version | File Type | Upload Date |
---|---|---|---|

DBSCAN_multiplex-1.5.tar.gz (10.0 kB) Copy SHA256 Checksum SHA256 | – | Source | Dec 7, 2015 |