Skip to main content

Yet another ROC curve drawer

Project description

Author:

Tamás Nepusz

This is yet another Python package for drawing ROC curves. It also lets you draw precision-recall, accumulation and concentrated ROC (CROC) curves, sensitivity-specificity plots, F-score curves and calculate the AUC (area under curve) statistics. The significance of differences between AUC scores can also be tested using paired permutation tests.

Where to get yard

yard has two homes at the moment:

  • The Python package index. This page hosts the most recent stable version of yard. Since yard is under heavy development at the moment, you might not get all the latest and greatest features of yard, but you will most likely find a version here that should not collapse even under exceptional circumstances.

  • A page on GitHub. On this page you can follow the development of yard as closely as possible; you can get the most recent development version, file bug reports, or even fork the project to start adding your own features.

Requirements

You will need the following tools to run yard:

  • Python 3.7 or later.

  • Matplotlib, which is responsible for plotting the curves. If you don’t have Matplotlib, you can export the points of the curves and then use an external plotting tool such as GNUPlot to plot them later.

  • NumPy is an optional dependency; some functions will be slightly faster if you have NumPy, but yard should work fine without it as well.

Installation

The simplest way to install yard is by using pip:

$ pip install yard

This goes to the Python package index page, fetches the most recent stable version and installs it, creating two scripts in your path: yard-auc for AUC score calculation, yard-plot for plotting and yard-significance for significance testing.

If you want the bleeding edge version, you should go to the GitHub page, download a ZIP or .tar.gz file, extract it to some directory and then run the following command:

$ python setup.py install

Running yard

yard works with simple tabular flat files, and assumes that the first row in each file is a header. Each row contains data related to a particular test example. By default, the first column contains the expected outcome of a binary classifier for a given test example (i.e. whether the example is positive or negative), while the remaining columns contain the output of the probabilistic classifiers being tested on the test dataset. The expected outcome must be positive for positive examples and zero or negative for negative examples - this means that you may use either zeros and ones or -1 and 1 for negative and positive test examples, respectively. The outcomes of the classifiers may be in any range, but they are most frequently in the interval [0; 1]. The following snippet shows an example input file:

output  Method1 Method2 Method3
-1      0.2     0.3     0.6
-1      0.4     0.15    0.1
+1      0.7     0.2     0.9
+1      0.3     0.85    1.0

Columns must be separated by tabs per default, but this can be overridden with the -f option on the command line. The actual columns being used can also be overridden using -c; for instance, if you have the expected outcome in column 4 and the actual outcomes in columns 1-3, you may use -c 4,1-3 to specify that.

Some usage examples are presented here; for more details, type yard-plot --help or yard-significance --help.

To show a ROC curve for an arbitrary number of classifiers where the expected and actual outcomes are defined in input_data.txt:

$ yard-plot input_data.txt

If the actual outcomes are in columns 3-5, the expected outcome is in column 6 and the columns are separated by semicolons:

$ yard-plot -f ';' -c 6,3-5 input_data.txt

To plot precision-recall curves instead of ROC curves and also show the AUC statistics:

$ yard-plot -t pr --show-auc input_data.txt

Supported curve types are: roc for ROC curves (default), pr for precision-recall curves, croc for CROC curves, ac for accumulation curves, sespe for sensitivity-specificity plots, fscore for F-score curves.

To use a logarithmic X axis for the ROC curve and use the standard input instead of a file:

$ yard-plot -l x

The omission of an input filename instructs yard-plot to use the standard input. You may have also used - in place of the filename to specify that.

To save a ROC curve into a PDF file:

$ yard-plot -o roc_curve.pdf input_data.txt

You may specify other formats as long as they are supported by Matplotlib:

$ yard-plot -o roc_curve.ps input_data.txt
$ yard-plot -o roc_curve.png input_data.txt

The PDF backend also supports multiple plots in separate pages:

$ yard-plot -t pr -t roc -t croc -o curves.pdf input_data.txt

The figure size, the DPI ratio and the font size can also be adjusted:

$ yard-plot -o roc_curve.pdf --font-size 8 -s '8cm x 6cm' input_data.txt

To calculate the AUC statistics for multiple curves without plotting them:

$ yard-auc -t pr -t roc input_data.txt

To test whether the ROC curves of multiple classifiers are significantly different:

$ yard-significance input_data.txt

Questions, comments

If you have a question or comment about yard or you think you have found a bug, feel free to contact me.

Acknowledgments and references

The inclusion of CROC curves and the statistical significance testing was inspired by the following publication (which also provides more details on what CROC curves are and why they are more useful than ROC curves in many cases):

A CROC Stronger than ROC: Measuring, Visualizing and Optimizing Early Retrieval. S. Joshua Swamidass, Chloe-Agathe Azencott, Kenny Daily and Pierre Baldi. Bioinformatics, April 2010, doi:10.1093/bioinformatics/btq140

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yard-0.3.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

yard-0.3.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file yard-0.3.0.tar.gz.

File metadata

  • Download URL: yard-0.3.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.2 Darwin/19.6.0

File hashes

Hashes for yard-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7db4b9e48b0b9cd1f992695802497a950047327c50003c60e869127cc080873f
MD5 00a96a4bf071f22339f93c5185d506dd
BLAKE2b-256 622ac5a6bb1cd68171475e5317abe8199fd415646c1b20440dd3b49640fa5f7f

See more details on using hashes here.

File details

Details for the file yard-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: yard-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.2 Darwin/19.6.0

File hashes

Hashes for yard-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a724d78ee6157ec22b6e918570e3564d5d259d5df5c0103425eb31b87410df3b
MD5 d465ab3fe360a50b4aca8b20f244e112
BLAKE2b-256 3c6e1f84ca800270699075aa43605d98de6e943437a09219eddaf3e38c7972b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page