RBHC
Project description
RBHC
Recursive Binary Hierarchical Clustering
This code is for accomplishing recursive binary hierarchical clustering of data
K-Means algorithm is applied on the initial dataset and a binary partition is created after which using chi square score statistic, the feature (event) that was responsible for the partition is found out. The remaining clusters are further divided recursively using the above approach until the cluster size reaches 1 or the silhouette score reaches the threshold value
Installation
Prerequisites: python3
pip install RBHC
Usage
from RBHC import clustering
clustering(dataFilePath,thresholdValue)
- dataFilePath = Path to data file Check data file structure
- thresholdValue = Silhouette value threshold (optional parameter and default in program is 0.65)
Return value from this function is a json with a tree structure that is generated with following important fields
- name = Name of cluster node (string)
- parent = Name of it's parent node (string)
- size = Size of cluster (integer)
- children = Tree structure of subtree (List)
- clusterCreated = If clustering has been successful (Boolean)
To see a sample of this return value run clustering over sample dataset provided and print output or check visualisation/sampleData.json
If you want to run this program in an interactive manner in a jupyter notebook run this command in root directory jupyter notebook and then it opens up in localhost
Statistics
Once program runs then clustering statistics are stored in statistics/hierarchical/nameOfDataFile/ and for each sub cluster created stats are stored in a .json file and attributes are following
- ClusterId = Identifier of a sub cluster L=Level G=Number of cluster in that level counted left to right
- Size = Size of cluster
- Primary feature cluster created by = Name of feature which is responsible primarily for this cluster formation
- Features chi score = Shows chi score of all features in that cluster
- Stats on cluster by each feature = Stats of each feature in this cluster
- Ids = All instances that are part of cluster and names are derived from column[0] of data file
Visualisation
Copy visualisation folder to directory where clustering is being used
In visualisation folder nameOfDataFile.json will be created for clustering visualisation
Run this in visualisation folder python -m http.server 8888 and then in web browser open http://localhost:8888/
Data File Structure
IDS | feature1 | | featureN
------------|-------------|---------------------|-----------------
ID1 | value1 | | valueN
| | |
| | |
| | |
All data files should be stored in data folder and check data folder for a sample .csv data file
Contribution and license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file RBHC-1.0.1.tar.gz
.
File metadata
- Download URL: RBHC-1.0.1.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38c85f11d53c5a3f13789b93d039c703a7c259c3285c5d6e4d9be63b0b722d7e |
|
MD5 | 422fa485e427441aed8b0fcf5ac8d0ec |
|
BLAKE2b-256 | b35ae88406b04026ada1cfb0df42da89c52581f9cd137243da9fcdc380f40db9 |
File details
Details for the file RBHC-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: RBHC-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20516de7a1a47fd8ddaed0beb7f9cb3bf1e5371835093b4441861abdae4c645f |
|
MD5 | 1a1d7a9b8037d17b41e030ce5717545f |
|
BLAKE2b-256 | 633e8aba4472c98ee77fceed8e417e38d876bbc20d3a0bb599ee851b8fd49d04 |