Skip to main content

Pandas matrix confusion with plot features (matplotlib, seaborn...)

Project description

Latest Version Supported Python versions Wheel format License Development Status Downloads monthly Requirements Status Code Health Codacy Badge Build Status

pandas_confusion

A Python Pandas implementation of confusion matrix.

WORK IN PROGRESS - Use it a your own risk

Usage

Confusion matrix

Import ConfusionMatrix

from pandas_confusion import ConfusionMatrix

Define actual values (y_actu) and predicted values (y_pred)

y_actu = ['rabbit', 'cat', 'rabbit', 'rabbit', 'cat', 'dog', 'dog', 'rabbit', 'rabbit', 'cat', 'dog', 'rabbit']
y_pred = ['cat', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'cat', 'rabbit', 'rabbit']

Let’s define a (non binary) confusion matrix

confusion_matrix = ConfusionMatrix(y_actu, y_pred)
print("Confusion matrix:\n%s" % confusion_matrix)

You can see it

Predicted  cat  dog  rabbit  __all__
Actual
cat          3    0       0        3
dog          0    1       2        3
rabbit       2    1       3        6
__all__      5    2       5       12

Matplotlib plot of a confusion matrix

confusion_matrix.plot()
plt.show()
confusion\_matrix

confusion_matrix

Matplotlib plot of a normalized confusion matrix

confusion_matrix.plot(normalized=True)
plt.show()
confusion\_matrix\_norm

confusion_matrix_norm

Binary confusion matrix

Import BinaryConfusionMatrix and Backend

from pandas_confusion import BinaryConfusionMatrix, Backend

Define actual values (y_actu) and predicted values (y_pred)

y_actu = [ True,  True, False, False, False,  True, False,  True,  True,
           False,  True, False, False, False, False, False,  True, False,
            True,  True,  True,  True, False, False, False,  True, False,
            True, False, False, False, False,  True,  True, False, False,
           False,  True,  True,  True,  True, False, False, False, False,
            True, False, False, False, False, False, False, False, False,
           False,  True,  True, False,  True, False,  True,  True,  True,
           False, False,  True, False,  True, False, False,  True, False,
           False, False, False, False, False, False, False,  True, False,
            True,  True,  True,  True, False, False,  True, False,  True,
            True, False,  True, False,  True, False, False,  True,  True,
           False, False,  True,  True, False, False, False, False, False,
           False,  True,  True, False]

y_pred = [False, False, False, False, False,  True, False, False,  True,
       False,  True, False, False, False, False, False, False, False,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False,  True, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False,  True, False,
       False, False, False, False, False, False, False,  True, False,
       False,  True, False, False, False, False,  True, False,  True,
        True, False, False, False,  True, False, False,  True,  True,
       False, False,  True,  True, False, False, False, False, False,
       False,  True, False, False]

Let’s define a binary confusion matrix

binary_confusion_matrix = BinaryConfusionMatrix(y_actu, y_pred)
print("Binary confusion matrix:\n%s" % binary_confusion_matrix)

It display as a nicely labeled Pandas DataFrame

Binary confusion matrix:
Predicted  False  True  __all__
Actual
False         67     0       67
True          21    24       45
__all__       88    24      112

You can get useful attributes such as True Positive (TP), True Negative (TN) …

print binary_confusion_matrix.TP

Matplotlib plot of a binary confusion matrix

binary_confusion_matrix.plot()
plt.show()
binary\_confusion\_matrix

binary_confusion_matrix

Matplotlib plot of a normalized binary confusion matrix

binary_confusion_matrix.plot(normalized=True)
plt.show()
binary\_confusion\_matrix\_norm

binary_confusion_matrix_norm

Seaborn plot of a binary confusion matrix (ToDo)

from pandas_confusion import Backend
binary_confusion_matrix.plot(backend=Backend.Seaborn)

Confusion matrix and class statistics

Overall statistics and class statistics of confusion matrix can be easily displayed.

y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
cm = ConfusionMatrix(y_true, y_pred)
cm.print_stats()

You should get:

Confusion Matrix:

Classes  100  200  500  600  __all__
Actual
100        0    0    0    0        0
200        9    6    1    0       16
500        1    1    1    0        3
600        1    0    0    0        1
__all__   11    7    2    0       20


Overall Statistics:

Accuracy: 0.35
95% CI: (0.1539092047845412, 0.59218853453282805)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.978585644357
Kappa: 0.0780141843972
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                 100         200         500   600
Population                               20          20          20    20
Condition positive                        0          16           3     1
Condition negative                       20           4          17    19
Test outcome positive                    11           7           2     0
Test outcome negative                     9          13          18    20
TP: True Positive                         0           6           1     0
TN: True Negative                         9           3          16    19
FP: False Positive                       11           1           1     0
FN: False Negative                        0          10           2     1
TPR: Sensivity                          NaN       0.375   0.3333333     0
TNR=SPC: Specificity                   0.45        0.75   0.9411765     1
PPV: Pos Pred Value = Precision           0   0.8571429         0.5   NaN
NPV: Neg Pred Value                       1   0.2307692   0.8888889  0.95
FPR: False-out                         0.55        0.25  0.05882353     0
FDR: False Discovery Rate                 1   0.1428571         0.5   NaN
FNR: Miss Rate                          NaN       0.625   0.6666667     1
ACC: Accuracy                          0.45        0.45        0.85  0.95
F1 score                                  0   0.5217391         0.4     0
MCC: Matthews correlation coefficient   NaN   0.1048285    0.326732   NaN
Informedness                            NaN       0.125   0.2745098     0
Markedness                                0  0.08791209   0.3888889   NaN
Prevalence                                0         0.8        0.15  0.05
LR+: Positive likelihood ratio          NaN         1.5    5.666667   NaN
LR-: Negative likelihood ratio          NaN   0.8333333   0.7083333     1
DOR: Diagnostic odds ratio              NaN         1.8           8   NaN
FOR: False omission rate                  0   0.7692308   0.1111111  0.05

Statistics are also available as an OrderedDict using:

cm.stats()

ToDo list

  • Better documentation
  • Doctest
  • Matplotlib discrete colorbar (not for normalized plot)

see ColorbarBase

http://stackoverflow.com/questions/14777066/matplotlib-discrete-colorbar

Example:

from sklearn.metrics import f1_score, classification_report
f1_score(y_actu, y_pred)
print classification_report(y_actu, y_pred)
  • Compare with R “caret” package

http://stackoverflow.com/questions/26631814/create-a-confusion-matrix-from-a-dataframe

R

Actual <- c(600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200)
Predicted <- c(100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200)
df <- data.frame(Actual, Predicted)
#table(df)
col <- sort(union(df$Actual, df$Predicted))
df_conf <- table(lapply(df, factor, levels=col))
#table(lapply(df, factor, levels=seq(100, 600, 100)))
#table(lapply(df, factor, levels=c(100, 200, 500, 600)))

Python

>>> from pandas_confusion import ConfusionMatrix
>>> y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
>>> y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
>>> cm = ConfusionMatrix(y_true, y_pred)
>>> cm
Predicted  100  200  500  600  __all__
Actual
100          0    0    0    0        0
200          9    6    1    0       16
500          1    1    1    0        3
600          1    0    0    0        1
__all__     11    7    2    0       20

cm(i, j) in Python is conf_mat(j, i) in R

You can use cm.to_dataframe().transpose()

  • Overall statistics: No Information Rate, Mcnemar’s Test P-Value

    see confusionMatrix.R and print.confusionMatrix.R (caret) and e1071 package

  • Class statistics

    • see Caret code for Detection Rate, Detection Prevalence, Balanced Accuracy
  • Code metrics (landscape.io)

  • Create fake truth, prediction from confusion matrix (can be useful for unit test)

https://www.researchgate.net/post/Can_someone_help_me_to_calculate_accuracy_sensitivity_of_a_66_confusion_matrix

see code (ToDo)

  • Order confusion matrix easily

  • Create empty class easily

    cm = ConfusionMatrix(y_true, y_pred, labels=range(100, 600+1, 100))

Class 300 and class 400 should be create

R like method ? conf_mat_tab <- table(lapply(df, factor, levels = seq(100, 600, 100)))

http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html

idx_new_cls = pd.Index([300, 400])
new_idx = df.index | idx_new_cls
new_idx.name = 'Actual'
new_col = df.index | idx_new_cls
new_col.name = 'Predicted'
df = df.loc[new_idx, new_col].fillna(0)

see cm.enlarge(...)

  • Calculate Mcnemar’s Test P-Value with binary confusion matrix

R code

Actual <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE,
        FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
        FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
        TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
        FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE,
        FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE,
        TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
        FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
        FALSE, TRUE, TRUE, FALSE)

Predicted <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
      FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
      TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
      FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE)

Install

$ conda install pandas scikit-learn scipy

$ pip install pandas_confusion

Development

You can help to develop this library.

Issues

You can submit issues using https://github.com/scls19fr/pandas_confusion/issues

Clone

You can clone repository to try to fix issues yourself using:

$ git clone https://github.com/scls19fr/pandas_confusion.git

Run unit tests

Run all unit tests

$ nosetests -s -v

Run a given test

$ nosetests -s -v tests/test_pandas_confusion.py:test_pandas_confusion_normalized

Install development version

$ python setup.py install

or

$ sudo pip install git+git://github.com/scls19fr/pandas_confusion.git

Collaborating

  • Fork repository
  • Create a branch which fix a given issue
  • Submit pull requests

https://help.github.com/categories/collaborating/

Done

  • Continuous integration (Travis)
  • Convert a confusion matrix to a binary confusion matrix
  • Python package
  • Unit tests (nose)
  • Fix missing column and missing row
  • Overall statistics: Accuracy, 95% CI, P-Value [Acc > NIR], Kappa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pandas_confusion, version 0.0.4
Filename, size File type Python version Upload date Hashes
Filename, size pandas_confusion-0.0.4.tar.gz (14.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page