Skip to main content

Multi-dimensional clustering visualization tool.

Project description

MD_clustering is a package that allows for exploratory analysis of multi-dimensional data through KMeans clustering provided by scikit-learn.

DEPENDENCIES:

  • python =3.8.3
  • numpy = 1.18.5
  • matplotlib = 3.4.2
  • plotly = 5.1.0
  • scikit-learn = 0.24.2
  • seaborn = 0.11.1

The package operates as a class. It has implementations to handle preprocessed/unpreprocessed data. It has the build in ability to display the elbow-method to determine optimal number of clusters. As well as the loading scores involved for n PCA components. The class can display 1-D, 2-D, 3-D visualizations based on the output of the KMeans algorithm and the top 3 PCA components. Finally, if desired it can also display a pairwise plot of all features.

Below a list of possible functionality is shown, a deeper explenation is given when typing help(MD_clustering()):

  • from multi_dimensional_clustering import MD_clustering
  • MDC = MD_clustering() # Creating the object
  • MDC.load_data(PATH, label_column_name=COL_NAME, preprocessed=BOOL) # Loading the data, if there is a label column that should be appended or wants to be saved please specify. Also if the data has already been preprocessed.
  • MDC.drop_rows([int(s)]) # drop specific rows
  • MDC.drop_cols(cols=[COL_NAMES], save_cols=[COL_NAMES]) # drop unwanted cols, drop wanted columns that should be appended when saving data.
  • MDC.scale_data(scaler=SCALER) # Scale data by SCALER if not preprocessed yet.
  • MDC.get_loading_scores() # Display loading scores for n pca components.
  • MDC.get_n_clusters() # Display elbow-method graph through analyzing intertia from KMeans
  • MDC.cluster(clusters_n=int) # Cluster data through KMeans on n clusters
  • MDC.visualize('3D') # Visualize results in 3D plot
  • MDC.inverse_scale() # Inverse scale the data to original format if so desired
  • MDC.concat_saved_cols() # concat the dropped wanted columns
  • MDC.save_data() # Save data, possible file path can be given
  • MDC.pairwise_plot() # Plot a pairwise plot for further analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi_dimensional_clustering-0.2.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file multi_dimensional_clustering-0.2.1.tar.gz.

File metadata

  • Download URL: multi_dimensional_clustering-0.2.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3

File hashes

Hashes for multi_dimensional_clustering-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b650951f806b6fa3ef2c0fc8ae08bb04a469b1c943e25e47994d179aecf3b334
MD5 aa631d0b6d7e6ef1691e853ac2710d1d
BLAKE2b-256 a3a2d0a0c59ecca687341320ac886f096c7fc2d660c8848c08313564d711a947

See more details on using hashes here.

File details

Details for the file multi_dimensional_clustering-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: multi_dimensional_clustering-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3

File hashes

Hashes for multi_dimensional_clustering-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9843fbea6415f5923c62d68828fe00c4a250b8186f0fff0505d244e4b904f213
MD5 70b185f56b40c9e5326945e4439d9893
BLAKE2b-256 7dbaf2862e258a7cea2557778a4708f316d146d0e7abfe6e19a028e191da782f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page