Interactive classification diagnostic plots
Project description
classgraphic
Interactive classification diagnostic plots for scikit-learn.
We classify things for the purpose of doing something to them. Any classification which does not assist manipulation is worse than useless. - Randolph S. Bourne, "Education and Living", The Century Co (April 1917)
Major features:
Plotly based tables for:
- class_imbalance_table
- classification_table
- confusion_matrix_table
- describe (dataframe stats)
- prediction_table
- table
And the following charts:
- class_imbalance
- class_error
- det
- feature_importance
- missing
- precision_recall
- roc
- prediction_histogram
- threshold
For clustering:
- Delauney triangulations
- Voronoi tessalations
Try it
By trying it on binder, you'll see all the details and interactivity. The quickstart below has static images, but if you run these commands in a jupyter notebook, ipython or IDE you will be able to interact with them.
Quickstart
from classgraphic.essential import *
# loading the data
df = px.data.iris()
# let's see what kind of data we have
describe(df, transpose=True).show()
# any missing?
missing(df)
# features
X = df.drop(columns=["species", "species_id"])
#target
y = df["species"]
# Let's check our classes we will be training on and predicting
class_imbalance_table(y, condition="all")
# train / test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=random_state
)
# we want to see total count for each, default for bars is to be stacked, so that works
# we could also pass to class_imbalance barmode="overlay" if we prefer
class_imbalance(y_train, y_test, condition="train,test")
# model
model = LogisticRegression(max_iter=max_iter, random_state=random_state)
model.fit(X_train, y_train)
# predictions
y_score = model.predict_proba(X_test)
y_pred = model.predict(X_test)
confusion_matrix_table(model, y_test, y_pred).show()
classification_table(model, y_test, y_pred)
feature_importance(model, y, transpose=True)
This concludes the quickstart. There are many more visualizations and tables to explore.
See the notebooks
and docs
folders on github and the documentation
web site for more information.
Requirements
- Python 3.8 or later
- numpy
- pandas
- plotly>=5.0
- scikit-learn
Install
If you use conda, create an environment named classgraphic
, then activate it:
-
in Linux:
source activate pilot
-
In Windows:
conda activate pilot
If you use another environment management create and activate your environment using the normal steps.
Then execute:
python setup.py install
or for installing in development mode:
python -m pip install -e . --no-build-isolation
or alternatively
python setup.py develop
To install from github instead:
pip install git+https://github.com/dionresearch/classgraphic
See also
- stemgraphic python package for visualization of data and text
- Hotelling one and two sample Hotelling T2 tests, T2 and f statistics and univariate and multivariate control charts and anomaly detection
History
0.3.0 (2023-05-01)
- added 2D clustering visualization
- defaults to Voronoi tessalation
- optional Delauney triangulation
0.2.1 (2022-09-20)
- fixed image not showing on pypi
- fixed feature importance error
- warning = False didn't prevent warning to show
0.2.1 (2022-09-19)
- added binary classification notebook example
- fixed issue with non dataframe binary classification
0.2.0 (2022-09-18)
The previous version was a first step to doing a public release. This release added:
- documented
- updated the code to be in line with plotly 5.x
It was released to github and pypy.
0.1.0 (2019-10-27)
- First private release
Origins
Inspired by Dion Research LLC Internal EDA/anomaly and end to end data science platform. A dozen charts and tables were initially designed to provide better diagnostic reporting. Some can also be used for exploratory or explanatory purposes.
See: https://blog.dionresearch.com/2019/10/visualizations-explanatory-exploratory.html
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for classgraphic-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5333d1e4380a4616e18c46fe6f9f6cf035b3948e54491abe1c5764028e331a96 |
|
MD5 | bcdc180091ee24e567ebc3a0d605da3d |
|
BLAKE2b-256 | 22022c2a3ef9e9487d0af7ef1d346c308c233d6dd498d166dd6a6744684f3ad1 |