Multi-dimensional clustering visualization tool.
Project description
MD_clustering is a package that allows for exploratory analysis of multi-dimensional data through KMeans clustering provided by scikit-learn.
DEPENDENCIES:
- python =3.8.3
- numpy = 1.18.5
- matplotlib = 3.4.2
- plotly = 5.1.0
- scikit-learn = 0.24.2
- seaborn = 0.11.1
The package operates as a class. It has implementations to handle preprocessed/unpreprocessed data. It has the build in ability to display the elbow-method to determine optimal number of clusters. As well as the loading scores involved for n PCA components. The class can display 1-D, 2-D, 3-D visualizations based on the output of the KMeans algorithm and the top 3 PCA components. Finally, if desired it can also display a pairwise plot of all features.
Below a list of possible functionality is shown, a deeper explenation is given when typing help(MD_clustering()):
- from multi_dimensional_clustering import MD_clustering
- MDC = MD_clustering() # Creating the object
- MDC.load_data(PATH, label_column_name=COL_NAME, preprocessed=BOOL) # Loading the data, if there is a label column that should be appended or wants to be saved please specify. Also if the data has already been preprocessed.
- MDC.drop_rows([int(s)]) # drop specific rows
- MDC.drop_cols(cols=[COL_NAMES], save_cols=[COL_NAMES]) # drop unwanted cols, drop wanted columns that should be appended when saving data.
- MDC.scale_data(scaler=SCALER) # Scale data by SCALER if not preprocessed yet.
- MDC.get_loading_scores() # Display loading scores for n pca components.
- MDC.get_n_clusters() # Display elbow-method graph through analyzing intertia from KMeans
- MDC.cluster(clusters_n=int) # Cluster data through KMeans on n clusters
- MDC.visualize('3D') # Visualize results in 3D plot
- MDC.inverse_scale() # Inverse scale the data to original format if so desired
- MDC.concat_saved_cols() # concat the dropped wanted columns
- MDC.save_data() # Save data, possible file path can be given
- MDC.pairwise_plot() # Plot a pairwise plot for further analysis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for multi_dimensional_clustering-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d37458c1efb760a7fdf83a20a1473cba6407e1d7286b64834b570f11871ce7f |
|
MD5 | 435342b051715cf82ce9c8788cab67a8 |
|
BLAKE2b-256 | 887e690ba46aa3e67d312503adf662f2ba5d7a9d8d535d8ac1ddf068d94b4596 |
Hashes for multi_dimensional_clustering-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6d5e4fa67c58b7ee9adf5fb8822b43f772fc82cd033381eee8c9d3b9beeb754 |
|
MD5 | ba68442dbf48eb8acb649b4b1c4da0ee |
|
BLAKE2b-256 | 1b7867e55932cb3eda2faeaa7601fb985ab4c6ffcf96ad46d30da61e0b90a15f |