Skip to main content

Pandas matrix confusion with plot features (matplotlib, seaborn...)

Project description

|Latest Version| |Supported Python versions| |Download format| |License|
|Development Status| |Downloads| |Code Health| |Build Status|

pandas\_confusion
=================

A `Python <https://www.python.org/>`__
`Pandas <http://pandas.pydata.org/>`__ implementation of `confusion
matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`__.

WORK IN PROGRESS - Use it a your own risk

Usage
-----

Confusion matrix
----------------

Import ``ConfusionMatrix``

::

from pandas_confusion import ConfusionMatrix

Define actual values (``y_actu``) and predicted values (``y_pred``)

::

y_actu = ['rabbit', 'cat', 'rabbit', 'rabbit', 'cat', 'dog', 'dog', 'rabbit', 'rabbit', 'cat', 'dog', 'rabbit']
y_pred = ['cat', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'cat', 'rabbit', 'rabbit']

Let's define a (non binary) confusion matrix

::

confusion_matrix = ConfusionMatrix(y_actu, y_pred)
print("Confusion matrix:\n%s" % confusion_matrix)

You can see it

::

Predicted cat dog rabbit __all__
Actual
cat 3 0 0 3
dog 0 1 2 3
rabbit 2 1 3 6
__all__ 5 2 5 12

Matplotlib plot of a confusion matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

confusion_matrix.plot()
plt.show()

.. figure:: screenshots/cm.png
:alt: confusion\_matrix

confusion\_matrix
Matplotlib plot of a normalized confusion matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

confusion_matrix.plot(normalized=True)
plt.show()

.. figure:: screenshots/cm_norm.png
:alt: confusion\_matrix\_norm

confusion\_matrix\_norm
Binary confusion matrix
~~~~~~~~~~~~~~~~~~~~~~~

Import ``BinaryConfusionMatrix`` and ``Backend``

::

from pandas_confusion import BinaryConfusionMatrix, Backend

Define actual values (``y_actu``) and predicted values (``y_pred``)

::

y_actu = [ True, True, False, False, False, True, False, True, True,
False, True, False, False, False, False, False, True, False,
True, True, True, True, False, False, False, True, False,
True, False, False, False, False, True, True, False, False,
False, True, True, True, True, False, False, False, False,
True, False, False, False, False, False, False, False, False,
False, True, True, False, True, False, True, True, True,
False, False, True, False, True, False, False, True, False,
False, False, False, False, False, False, False, True, False,
True, True, True, True, False, False, True, False, True,
True, False, True, False, True, False, False, True, True,
False, False, True, True, False, False, False, False, False,
False, True, True, False]

y_pred = [False, False, False, False, False, True, False, False, True,
False, True, False, False, False, False, False, False, False,
True, True, True, True, False, False, False, False, False,
False, False, False, False, False, True, False, False, False,
False, True, False, False, False, False, False, False, False,
True, False, False, False, False, False, False, False, False,
False, True, False, False, False, False, False, False, False,
False, False, True, False, False, False, False, True, False,
False, False, False, False, False, False, False, True, False,
False, True, False, False, False, False, True, False, True,
True, False, False, False, True, False, False, True, True,
False, False, True, True, False, False, False, False, False,
False, True, False, False]

Let's define a binary confusion matrix

::

binary_confusion_matrix = BinaryConfusionMatrix(y_actu, y_pred)
print("Binary confusion matrix:\n%s" % binary_confusion_matrix)

It display as a nicely labeled Pandas DataFrame

::

Binary confusion matrix:
Predicted False True __all__
Actual
False 67 0 67
True 21 24 45
__all__ 88 24 112

You can get useful attributes such as True Positive (TP), True Negative
(TN) ...

::

print binary_confusion_matrix.TP

Matplotlib plot of a binary confusion matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

binary_confusion_matrix.plot()
plt.show()

.. figure:: screenshots/binary_cm.png
:alt: binary\_confusion\_matrix

binary\_confusion\_matrix
Matplotlib plot of a normalized binary confusion matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

binary_confusion_matrix.plot(normalized=True)
plt.show()

.. figure:: screenshots/binary_cm_norm.png
:alt: binary\_confusion\_matrix\_norm

binary\_confusion\_matrix\_norm
Seaborn plot of a binary confusion matrix (ToDo)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

from pandas_confusion import Backend
binary_confusion_matrix.plot(backend=Backend.Seaborn)

Confusion matrix and class statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Overall statistics and class statistics of confusion matrix can be
easily displayed.

::

y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
cm = ConfusionMatrix(y_true, y_pred)
cm.print_stats()

You should get:

::

Confusion Matrix:

Classes 100 200 500 600 __all__
Actual
100 0 0 0 0 0
200 9 6 1 0 16
500 1 1 1 0 3
600 1 0 0 0 1
__all__ 11 7 2 0 20


Overall Statistics:

Accuracy: 0.35
95% CI: (0.1539092047845412, 0.59218853453282805)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.978585644357
Kappa: 0.0780141843972
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes 100 200 500 600
Population 20 20 20 20
Condition positive 0 16 3 1
Condition negative 20 4 17 19
Test outcome positive 11 7 2 0
Test outcome negative 9 13 18 20
TP: True Positive 0 6 1 0
TN: True Negative 9 3 16 19
FP: False Positive 11 1 1 0
FN: False Negative 0 10 2 1
TPR: Sensivity NaN 0.375 0.3333333 0
TNR=SPC: Specificity 0.45 0.75 0.9411765 1
PPV: Pos Pred Value = Precision 0 0.8571429 0.5 NaN
NPV: Neg Pred Value 1 0.2307692 0.8888889 0.95
FPR: False-out 0.55 0.25 0.05882353 0
FDR: False Discovery Rate 1 0.1428571 0.5 NaN
FNR: Miss Rate NaN 0.625 0.6666667 1
ACC: Accuracy 0.45 0.45 0.85 0.95
F1 score 0 0.5217391 0.4 0
MCC: Matthews correlation coefficient NaN 0.1048285 0.326732 NaN
Informedness NaN 0.125 0.2745098 0
Markedness 0 0.08791209 0.3888889 NaN
Prevalence 0 0.8 0.15 0.05
LR+: Positive likelihood ratio NaN 1.5 5.666667 NaN
LR-: Negative likelihood ratio NaN 0.8333333 0.7083333 1
DOR: Diagnostic odds ratio NaN 1.8 8 NaN
FOR: False omission rate 0 0.7692308 0.1111111 0.05

Statistics are also available as an OrderedDict using:

::

cm.stats()

ToDo list
---------

- Better documentation

- Doctest

- Matplotlib discrete colorbar (not for normalized plot)

see ColorbarBase

http://stackoverflow.com/questions/14777066/matplotlib-discrete-colorbar

- Display numbers inside cells like
http://stackoverflow.com/questions/5821125/how-to-plot-confusion-matrix-with-string-axis-rather-than-integer-in-python

- Compare with results from Sklearn

Example:

::

from sklearn.metrics import f1_score, classification_report
f1_score(y_actu, y_pred)
print classification_report(y_actu, y_pred)

- Compare with R "caret" package

http://stackoverflow.com/questions/26631814/create-a-confusion-matrix-from-a-dataframe

R

::

Actual <- c(600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200)
Predicted <- c(100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200)
df <- data.frame(Actual, Predicted)
#table(df)
col <- sort(union(df$Actual, df$Predicted))
df_conf <- table(lapply(df, factor, levels=col))
#table(lapply(df, factor, levels=seq(100, 600, 100)))
#table(lapply(df, factor, levels=c(100, 200, 500, 600)))

Python

::

>>> from pandas_confusion import ConfusionMatrix
>>> y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
>>> y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
>>> cm = ConfusionMatrix(y_true, y_pred)
>>> cm
Predicted 100 200 500 600 __all__
Actual
100 0 0 0 0 0
200 9 6 1 0 16
500 1 1 1 0 3
600 1 0 0 0 1
__all__ 11 7 2 0 20

``cm(i, j)`` in Python is ``conf_mat(j, i)`` in R

You can use ``cm.to_dataframe().transpose()``

- Overall statistics: No Information Rate, Mcnemar's Test P-Value

see confusionMatrix.R and print.confusionMatrix.R (caret) and e1071
package

- Class statistics

- see Caret code for Detection Rate, Detection Prevalence, Balanced
Accuracy

- Code metrics (landscape.io)

- Create fake truth, prediction from confusion matrix (can be useful
for unit test)

https://www.researchgate.net/post/Can\_someone\_help\_me\_to\_calculate\_accuracy\_sensitivity\_of\_a\_66\_confusion\_matrix

`see code (ToDo) <samples/fake_convol_mat.py>`__

- Order confusion matrix easily

- Create empty class easily

cm = ConfusionMatrix(y\_true, y\_pred, labels=range(100, 600+1, 100))

Class 300 and class 400 should be create

R like method ?
``conf_mat_tab <- table(lapply(df, factor, levels = seq(100, 600, 100)))``

http://pandas.pydata.org/pandas-docs/stable/comparison\_with\_r.html

::

idx_new_cls = pd.Index([300, 400])
new_idx = df.index | idx_new_cls
new_idx.name = 'Actual'
new_col = df.index | idx_new_cls
new_col.name = 'Predicted'
df = df.loc[new_idx, new_col].fillna(0)

see ``cm.enlarge(...)``

- Calculate Mcnemar's Test P-Value with binary confusion matrix

R code

::

Actual <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, TRUE, FALSE)

Predicted <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE)

Install
-------

::

$ conda install pandas scikit-learn scipy

$ pip install pandas_confusion

Done
----

- Continuous integration (Travis)

- Convert a confusion matrix to a binary confusion matrix

- Python package

- Unit tests (nose)

- Fix missing column and missing row

- Overall statistics: Accuracy, 95% CI, P-Value [Acc > NIR], Kappa

.. |Latest Version| image:: https://pypip.in/version/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |Supported Python versions| image:: https://pypip.in/py_versions/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |Download format| image:: https://pypip.in/format/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |License| image:: https://pypip.in/license/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |Development Status| image:: https://pypip.in/status/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |Downloads| image:: https://pypip.in/download/pandas_confusion/badge.svg
:target: https://pypi.python.org/pypi/pandas_confusion/
.. |Code Health| image:: https://landscape.io/github/scls19fr/pandas_confusion/master/landscape.svg?style=flat
:target: https://landscape.io/github/scls19fr/pandas_confusion/master
.. |Build Status| image:: https://travis-ci.org/scls19fr/pandas_confusion.svg
:target: https://travis-ci.org/scls19fr/pandas_confusion

Project details


Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page