A package to analyze and interpret categorical clustered data
Project description
Clustering Explorer
The Clustering Explorer allows users to interactively analyze which factors in a dataset are most associated with clusters. Users can lasso points of interest in a 2D plot of the data, which is created using Principal Component Analysis (PCA) for dimensionality reduction. The tool provides three modes of analysis: 'table', 'histogram', and 'explainer'.
Usage
Create Lasso Tool
create_lasso(df, mode='table', label_col=None, exclude_cols=[], num_factors = 10, dtreeviz_plot=True)
The create_lasso function creates a lasso tool for data analysis. The parameters for this function are:
df
: A Pandas DataFrame of the data to be analyzed
mode
: The mode of analysis. Can be 'table', 'histogram', or 'explainer'
label_col
: The column name to be used for color coding of the plot
exclude_cols
: A list of columns to exclude from the analysis
num_factors
: Number of factors to consider when mode is 'explainer'
dtreeviz_plot
: A boolean value to decide whether to plot decision tree using dtreeviz library
The mode parameter determines the type of analysis that will be performed:
'table'
: shows a table of the selected points
'histogram'
: shows an interactive histogram of each column's values among selected points compared with among all points
'explainer'
: predicts which factors lead to the clustered selection with a decision tree
The dtreeviz_plot parameter is used when mode is 'explainer'. If dtreeviz_plot is True, the decision tree is plotted using the dtreeviz library. Otherwise, the decision tree is plotted using sklearn, which is faster.
Dependencies
Python 3.6+ numpy pandas sklearn dtreeviz plotly ipywidgets itertools
Notes
The tool is designed for datasets that can fit in memory. For larger datasets, consider using a sampling method or dimensionality reduction techniques before using this tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clustersight-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6db8cb581d6b2ac86d44c3d585794c54f45dd945128cebccbaec7b36488864ee |
|
MD5 | 0b27c0147ef2726718527ac33ca5b9a1 |
|
BLAKE2b-256 | 7cee6b092a342dab509c0114a6fa4851ceb2a6dad4d6cebc7e98033bdb23bb9b |