Skip to main content

A package to analyze and interpret categorical clustered data

Project description

Clustering Explorer

The Clustering Explorer allows users to interactively analyze which factors in a dataset are most associated with clusters. Users can lasso points of interest in a 2D plot of the data, which is created using Principal Component Analysis (PCA) for dimensionality reduction. The tool provides three modes of analysis: 'table', 'histogram', and 'explainer'.

Usage

Create Lasso Tool

create_lasso(df, mode='table', label_col=None, exclude_cols=[], num_factors = 10, dtreeviz_plot=True)

The create_lasso function creates a lasso tool for data analysis. The parameters for this function are:

df: A Pandas DataFrame of the data to be analyzed mode: The mode of analysis. Can be 'table', 'histogram', or 'explainer' label_col: The column name to be used for color coding of the plot exclude_cols: A list of columns to exclude from the analysis num_factors: Number of factors to consider when mode is 'explainer' dtreeviz_plot: A boolean value to decide whether to plot decision tree using dtreeviz library

The mode parameter determines the type of analysis that will be performed:

'table': shows a table of the selected points 'histogram': shows an interactive histogram of each column's values among selected points compared with among all points 'explainer': predicts which factors lead to the clustered selection with a decision tree

The dtreeviz_plot parameter is used when mode is 'explainer'. If dtreeviz_plot is True, the decision tree is plotted using the dtreeviz library. Otherwise, the decision tree is plotted using sklearn, which is faster.

Dependencies

Python 3.6+ numpy pandas sklearn dtreeviz plotly ipywidgets itertools

Notes

The tool is designed for datasets that can fit in memory. For larger datasets, consider using a sampling method or dimensionality reduction techniques before using this tool.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustersight-0.0.2.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

clustersight-0.0.2-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file clustersight-0.0.2.tar.gz.

File metadata

  • Download URL: clustersight-0.0.2.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for clustersight-0.0.2.tar.gz
Algorithm Hash digest
SHA256 61ebb9c242c873786af7f2560e016d36aaf128bd9d7ac561dd7368a5acfeb05b
MD5 843108426f348bc0382d67f6a3b07ce7
BLAKE2b-256 1f76e19f513ca5ce509ee07ab4f27cca21bd700768e272ea78dd2895979f39f6

See more details on using hashes here.

File details

Details for the file clustersight-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for clustersight-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6db8cb581d6b2ac86d44c3d585794c54f45dd945128cebccbaec7b36488864ee
MD5 0b27c0147ef2726718527ac33ca5b9a1
BLAKE2b-256 7cee6b092a342dab509c0114a6fa4851ceb2a6dad4d6cebc7e98033bdb23bb9b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page