Skip to main content

A package to analyze and interpret categorical clustered data

Project description

Clustering Explorer

The Clustering Explorer allows users to interactively analyze which factors in a dataset are most associated with clusters. Users can lasso points of interest in a 2D plot of the data, which is created using Principal Component Analysis (PCA) for dimensionality reduction. The tool provides three modes of analysis: 'table', 'histogram', and 'explainer'.

Usage

Create Lasso Tool

create_lasso(df, mode='table', label_col=None, exclude_cols=[], num_factors = 10, dtreeviz_plot=True)

The create_lasso function creates a lasso tool for data analysis. The parameters for this function are:

df: A Pandas DataFrame of the data to be analyzed mode: The mode of analysis. Can be 'table', 'histogram', or 'explainer' label_col: The column name to be used for color coding of the plot exclude_cols: A list of columns to exclude from the analysis num_factors: Number of factors to consider when mode is 'explainer' dtreeviz_plot: A boolean value to decide whether to plot decision tree using dtreeviz library

The mode parameter determines the type of analysis that will be performed:

'table': shows a table of the selected points 'histogram': shows an interactive histogram of each column's values among selected points compared with among all points 'explainer': predicts which factors lead to the clustered selection with a decision tree

The dtreeviz_plot parameter is used when mode is 'explainer'. If dtreeviz_plot is True, the decision tree is plotted using the dtreeviz library. Otherwise, the decision tree is plotted using sklearn, which is faster.

Dependencies

Python 3.6+ numpy pandas sklearn dtreeviz plotly ipywidgets itertools

Notes

The tool is designed for datasets that can fit in memory. For larger datasets, consider using a sampling method or dimensionality reduction techniques before using this tool.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustersight-0.0.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

clustersight-0.0.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file clustersight-0.0.1.tar.gz.

File metadata

  • Download URL: clustersight-0.0.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for clustersight-0.0.1.tar.gz
Algorithm Hash digest
SHA256 24416f9418d335e2c3de85fcd3404ccce51c0f3ac22609181a46b3b5a4e02497
MD5 53acfe295b5ca669299c00e36b465e33
BLAKE2b-256 61d9649b8fb531f34cec3a380eb0d81e7ca733973046c8e5099b1ba7f170e321

See more details on using hashes here.

File details

Details for the file clustersight-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for clustersight-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82402bfd9ac08be4e4dcc2025507515cd35455a297caca977c85244d04b5422f
MD5 1f6989624b07274b57243b881746dc8f
BLAKE2b-256 35475c46a36eb2fcb4611f89191d22312d3d3ef12b277fbb037b4535972e26ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page