A package to analyze and interpret categorical clustered data
Project description
Clustering Explorer
The Clustering Explorer allows users to interactively analyze which factors in a dataset are most associated with clusters. Users can lasso points of interest in a 2D plot of the data, which is created using Principal Component Analysis (PCA) for dimensionality reduction. The tool provides three modes of analysis: 'table', 'histogram', and 'explainer'.
Usage
Create Lasso Tool
create_lasso(df, mode='table', label_col=None, exclude_cols=[], num_factors = 10, dtreeviz_plot=True)
The create_lasso function creates a lasso tool for data analysis. The parameters for this function are:
df
: A Pandas DataFrame of the data to be analyzed
mode
: The mode of analysis. Can be 'table', 'histogram', or 'explainer'
label_col
: The column name to be used for color coding of the plot
exclude_cols
: A list of columns to exclude from the analysis
num_factors
: Number of factors to consider when mode is 'explainer'
dtreeviz_plot
: A boolean value to decide whether to plot decision tree using dtreeviz library
The mode parameter determines the type of analysis that will be performed:
'table'
: shows a table of the selected points
'histogram'
: shows an interactive histogram of each column's values among selected points compared with among all points
'explainer'
: predicts which factors lead to the clustered selection with a decision tree
The dtreeviz_plot parameter is used when mode is 'explainer'. If dtreeviz_plot is True, the decision tree is plotted using the dtreeviz library. Otherwise, the decision tree is plotted using sklearn, which is faster.
Dependencies
Python 3.6+ numpy pandas sklearn dtreeviz plotly ipywidgets itertools
Notes
The tool is designed for datasets that can fit in memory. For larger datasets, consider using a sampling method or dimensionality reduction techniques before using this tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file clustersight-0.0.1.tar.gz
.
File metadata
- Download URL: clustersight-0.0.1.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24416f9418d335e2c3de85fcd3404ccce51c0f3ac22609181a46b3b5a4e02497 |
|
MD5 | 53acfe295b5ca669299c00e36b465e33 |
|
BLAKE2b-256 | 61d9649b8fb531f34cec3a380eb0d81e7ca733973046c8e5099b1ba7f170e321 |
File details
Details for the file clustersight-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: clustersight-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82402bfd9ac08be4e4dcc2025507515cd35455a297caca977c85244d04b5422f |
|
MD5 | 1f6989624b07274b57243b881746dc8f |
|
BLAKE2b-256 | 35475c46a36eb2fcb4611f89191d22312d3d3ef12b277fbb037b4535972e26ad |