The package is the implementation of the Burj Khalifa clustering algorithm
Project description
Burj Khalifa Clustering
The bk_clustering
package is a Python realization of the Burj Khalifa clustering method.
The Burj Khalifa method can be considered as Agglomerative clustering on steroids: great quality with no parameters required!
The idea is to automatically detect solid clusters, based on the dendrogram. Read more in the publication section.
Installation
To install the package, run:
pip install bk_clustering
Usage
Here's an example of using Burj Khalifa clustering algorithm:
from bk_clustering import BurjKhalifaClustering
# Initialize BurjKhalifaClustering object
bk_model = BurjKhalifaClustering()
# Fit data to the algorithm
bk_model.fit(X)
# Get labels
labels = bk_model.labels_
Examples
- Iris Dataset
- Comparison with other clustering methods
- Mall Customer Segmentation
- Bank Customer Segmentation
Time limitations
The time complexity of distance-based algorithms typically depends on the number of data points, the number of features, and the number of clusters. Building a distance matrix for hierarchical clustering is essential in the clustering process. The size of the distance matrix is N x N, where N is the number of data points in the dataset. The time complexity of building the distance matrix for hierarchical clustering is O(N^2), which means that the time required to compute the pairwise distances between all data points increases quadratically with the size of the dataset. As the data size increases, the time complexity of distance computation snowballs, making these algorithms computationally expensive. Below are time performance matrixes attached for 10 and 100 clusters. As one can see, even with 10.000^2 data points, the algorithm works with acceptable timings (just a bit longer than 10 minutes Performance notebook).
Further work will be dedicated to optimizing python code and parallelizing some steps.
Publication
The publication is under draft at the moment. Plots and tables for the paper.
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on the GitHub repository.
License
This package is licensed under the MIT License.
Inspiration
The Burj Khalifa's ladder-view design consists of a succession of terraces or setbacks that, when viewed from the outside, resemble the branches of a tree or a dendrogram and gradually get smaller as the structure rises. The inspiration for the method was taken from the view of the building: instead of having the width changed gradually, the terraces are located at specific, uneven from different sides and levels, skipping some of them. Such construction is closely associated with the proposed method, where a dendrogram is modified to a tree structure on solid levels.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bk_clustering-0.3.2.tar.gz
.
File metadata
- Download URL: bk_clustering-0.3.2.tar.gz
- Upload date:
- Size: 28.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.19.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7a449d4c948830a6430b15e9bf1db396af37bcee7d4d448c9694dc3b95b13d6 |
|
MD5 | bcc72cecf2580c49919af2469a8557cf |
|
BLAKE2b-256 | b92db2300ce3dc3ff7666c9075a2229a41215312faa7118c032cb6058ad2c9a9 |
File details
Details for the file bk_clustering-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: bk_clustering-0.3.2-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.19.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7092649e0595be483c3577867ad70f1e81a17382831fc22317ca7618cbeef128 |
|
MD5 | e3817f35f50238fd10bd2f8959bdca56 |
|
BLAKE2b-256 | c406fde772bf030283edd90f7d2445ca68c05b5c6682c2fd9291665cbc4db72f |