Skip to main content

The package is the implementation of the Burj Khalifa clustering algorithm

Project description

Logo

Burj Khalifa Clustering

The bk_clustering package is a Python realization of the Burj Khalifa clustering method. The Burj Khalifa method can be considered as Agglomerative clustering on steroids: great quality with no parameters required!

The idea is to automatically detect solid clusters, based on the dendrogram. Read more in the publication section.

Installation

To install the package, run:

pip install bk_clustering

Usage

Here's an example of using Burj Khalifa clustering algorithm:

from bk_clustering import BurjKhalifaClustering

# Initialize BurjKhalifaClustering object
bk_model = BurjKhalifaClustering()

# Fit data to the algorithm
bk_model.fit(X)

# Get labels
labels = bk_model.labels_

Examples

Time limitations

The time complexity of distance-based algorithms typically depends on the number of data points, the number of features, and the number of clusters. Building a distance matrix for hierarchical clustering is essential in the clustering process. The size of the distance matrix is N x N, where N is the number of data points in the dataset. The time complexity of building the distance matrix for hierarchical clustering is O(N^2), which means that the time required to compute the pairwise distances between all data points increases quadratically with the size of the dataset. As the data size increases, the time complexity of distance computation snowballs, making these algorithms computationally expensive. Below are time performance matrixes attached for 10 and 100 clusters. Time performance As one can see, even with 10.000^2 data points, the algorithm works with acceptable timings (just a bit longer than 10 minutes Performance notebook).

Further work will be dedicated to optimizing python code and parallelizing some steps.

Publication

The publication is under draft at the moment. Plots and tables for the paper.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue on the GitHub repository.

License

This package is licensed under the MIT License.

Inspiration

The Burj Khalifa's ladder-view design consists of a succession of terraces or setbacks that, when viewed from the outside, resemble the branches of a tree or a dendrogram and gradually get smaller as the structure rises. The inspiration for the method was taken from the view of the building: instead of having the width changed gradually, the terraces are located at specific, uneven from different sides and levels, skipping some of them. Such construction is closely associated with the proposed method, where a dendrogram is modified to a tree structure on solid levels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bk_clustering-0.3.2.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

bk_clustering-0.3.2-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file bk_clustering-0.3.2.tar.gz.

File metadata

  • Download URL: bk_clustering-0.3.2.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.19.0-41-generic

File hashes

Hashes for bk_clustering-0.3.2.tar.gz
Algorithm Hash digest
SHA256 c7a449d4c948830a6430b15e9bf1db396af37bcee7d4d448c9694dc3b95b13d6
MD5 bcc72cecf2580c49919af2469a8557cf
BLAKE2b-256 b92db2300ce3dc3ff7666c9075a2229a41215312faa7118c032cb6058ad2c9a9

See more details on using hashes here.

File details

Details for the file bk_clustering-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: bk_clustering-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.19.0-41-generic

File hashes

Hashes for bk_clustering-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7092649e0595be483c3577867ad70f1e81a17382831fc22317ca7618cbeef128
MD5 e3817f35f50238fd10bd2f8959bdca56
BLAKE2b-256 c406fde772bf030283edd90f7d2445ca68c05b5c6682c2fd9291665cbc4db72f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page