Multi-dimensional clustering visualization tool.
Project description
MD_clustering is a package that allows for exploratory analysis of multi-dimensional data through KMeans clustering provided by scikit-learn.
DEPENDENCIES:
- python =3.8.3
- numpy = 1.18.5
- matplotlib = 3.4.2
- plotly = 5.1.0
- scikit-learn = 0.24.2
- seaborn = 0.11.1
The package operates as a class. It has implementations to handle preprocessed/unpreprocessed data. It has the build in ability to display the elbow-method to determine optimal number of clusters. As well as the loading scores involved for n PCA components. The class can display 1-D, 2-D, 3-D visualizations based on the output of the KMeans algorithm and the top 3 PCA components. Finally, if desired it can also display a pairwise plot of all features.
Below a list of possible functionality is shown, a deeper explenation is given when typing help(MD_clustering()):
- from multi_dimensional_clustering import MD_clustering
- MDC = MD_clustering() # Creating the object
- MDC.load_data(PATH, label_column_name=COL_NAME, preprocessed=BOOL) # Loading the data, if there is a label column that should be appended or wants to be saved please specify. Also if the data has already been preprocessed.
- MDC.drop_rows([int(s)]) # drop specific rows
- MDC.drop_cols(cols=[COL_NAMES], save_cols=[COL_NAMES]) # drop unwanted cols, drop wanted columns that should be appended when saving data.
- MDC.scale_data(scaler=SCALER) # Scale data by SCALER if not preprocessed yet.
- MDC.get_loading_scores() # Display loading scores for n pca components.
- MDC.get_n_clusters() # Display elbow-method graph through analyzing intertia from KMeans
- MDC.cluster(clusters_n=int) # Cluster data through KMeans on n clusters
- MDC.visualize('3D') # Visualize results in 3D plot
- MDC.inverse_scale() # Inverse scale the data to original format if so desired
- MDC.concat_saved_cols() # concat the dropped wanted columns
- MDC.save_data() # Save data, possible file path can be given
- MDC.pairwise_plot() # Plot a pairwise plot for further analysis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multi_dimensional_clustering-0.2.1.tar.gz
.
File metadata
- Download URL: multi_dimensional_clustering-0.2.1.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b650951f806b6fa3ef2c0fc8ae08bb04a469b1c943e25e47994d179aecf3b334 |
|
MD5 | aa631d0b6d7e6ef1691e853ac2710d1d |
|
BLAKE2b-256 | a3a2d0a0c59ecca687341320ac886f096c7fc2d660c8848c08313564d711a947 |
File details
Details for the file multi_dimensional_clustering-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: multi_dimensional_clustering-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9843fbea6415f5923c62d68828fe00c4a250b8186f0fff0505d244e4b904f213 |
|
MD5 | 70b185f56b40c9e5326945e4439d9893 |
|
BLAKE2b-256 | 7dbaf2862e258a7cea2557778a4708f316d146d0e7abfe6e19a028e191da782f |