Skip to main content

The package is the implementation of the Burj Khalifa clustering algorithm

Project description

Logo

Burj Khalifa Clustering

The bk_clustering package is a Python realization of the Burj Khalifa clustering method. The Burj Khalifa method can be considered as Agglomerative clustering on steroids: great quality with no parameters required!

The idea is to automatically detect solid clusters, based on the dendrogram. Read more in the publication section.

Installation

To install the package, run:

pip install bk_clustering

Usage

Here's an example of using Burj Khalifa clustering algorithm:

from bk_clustering import BurjKhalifaClustering

# Initialize BurjKhalifaClustering object
bk_model = BurjKhalifaClustering()

# Fit data to the algorithm
bk_model.fit(X)

# Get labels
labels = bk_model.labels_

Examples

Time limitations

The time complexity of distance-based algorithms typically depends on the number of data points, the number of features, and the number of clusters. Building a distance matrix for hierarchical clustering is essential in the clustering process. The size of the distance matrix is N x N, where N is the number of data points in the dataset. The time complexity of building the distance matrix for hierarchical clustering is O(N^2), which means that the time required to compute the pairwise distances between all data points increases quadratically with the size of the dataset. As the data size increases, the time complexity of distance computation snowballs, making these algorithms computationally expensive. Below are time performance matrixes attached for 10 and 100 clusters. Time performance As one can see, even with 10.000^2 data points, the algorithm works with acceptable timings (just a bit longer than 10 minutes Performance notebook).

Further work will be dedicated to optimizing python code and parallelizing some steps.

Publication

The publication is under draft at the moment. Plots and tables for the paper.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue on the GitHub repository.

License

This package is licensed under the MIT License.

Inspiration

The Burj Khalifa's ladder-view design consists of a succession of terraces or setbacks that, when viewed from the outside, resemble the branches of a tree or a dendrogram and gradually get smaller as the structure rises. The inspiration for the method was taken from the view of the building: instead of having the width changed gradually, the terraces are located at specific, uneven from different sides and levels, skipping some of them. Such construction is closely associated with the proposed method, where a dendrogram is modified to a tree structure on solid levels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bk_clustering-0.3.2.tar.gz (28.7 kB view hashes)

Uploaded Source

Built Distribution

bk_clustering-0.3.2-py3-none-any.whl (32.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page