TABULA RASA (EU-project) plotting/score checking tool
The Toolkit is conceived for these purposes:
- Plot the DET curve for a particular system
- Check the consistency between score files w.r.t. the filenames scores refer to
To install from the command line on a machine you have access to the python installation tree (e.g., on a Windows machine):
$ easy_install trstk # or $ pip install trstk
If you don’t have adminstrative rights on the Python installation directory,
you can create an isolated virtual environment using virtualenv. Follow
instructions there to download and create a virtual environment and then either
pip install this package.
Our PyPI page also contains a link to a Windows graphical installer. Unfortunately, it does not install the package dependencies like the command line installer does. You have to do it yourself. Here is the dependencies list:
Visit those webpages for more information.
Tools in this package accept score files in one single textual format. Each line in the file refers to one single sample in the database being analyzed. Each line is composed of 4 fields separated by spaces in this order:
- Claimed identity: a string that defines the claimed identity of the subject being analyzed
- Model label: contains a label/reference to the data used to make the model (filename <id>d<capture_number> used to make the model)
- Real identity: a string that defines the real identity of the subject being analyzed (i.e. the output of the classification)
- Test label: contains a label/reference to the data used to do the testing (filename <id>d<capture_number> of the test file)
- Score: a floating-point value representing the score
Each of the above-mentioned fields cannot have spaces in between. Failing to comply will make the tools emit syntax errors pointing to the location in the file where problems seem to occur.
Here is a valid example score file:
02463 02463d547 02463 02463d653 0.623265 02463 02463d547 02463 02463d655 0.920861 02463 02463d547 02463 02463d657 0.938942 02463 02463d547 02463 02463d659 0.743715 02463 02463d547 02463 02463d661 0.397660 02463 02463d547 02463 02463d663 0.615722 02463 02463d547 02463 02463d665 0.613291 02463 02463d547 02463 02463d667 0.543184 02463 02463d547 02463 02463d669 0.829777 02463 02463d547 02463 02463d671 0.869681 02463 02463d547 02463 02463d673 0.806394 02463 02463d547 02463 02463d675 1.007791 02463 02463d547 04200 04200d75 0.257423
Here is an invalid example score file:
Bob_Jones bob-file-001 Bob_Jones bob-file-004 -37.643410 Susan Smith susan-file-001 Susan Smith susan-file-001 -33.393433 Joe joe-file-030 Joe joe-file-001 -72.295616
In this case, line 2 above will fail because the real identity field and the claimed identity fields contain spaces. Lines 1 and 3 do conform to the proposed scheme and will be parsed without problems.
If you have multiple modalities you should build a single text file along the lines explained before, for each modality. The order of the tags within each file should be respected. Example Hypothetical face verification experiment output:
02463 02463d547 02463 02463d675 1.007791 02463 02463d547 04200 04200d75 0.257423 02463 02463d547 04201 04201d435 0.315074 02463 02463d547 04201 04201d437 0.347413 02463 02463d547 04201 04201d439 0.296383 02463 02463d547 04201 04201d443 0.371881 02463 02463d547 04201 04201d445 0.260964
Hypothetical speech verification experiment output:
02463 02463d547 02463 02463d675 0.9932 02463 02463d547 04200 04200d75 0.0027 02463 02463d547 04201 04201d435 0.0144 02463 02463d547 04201 04201d437 0.0159 02463 02463d547 04201 04201d439 0.1250 02463 02463d547 04201 04201d443 0.0031 02463 02463d547 04201 04201d445 0.0002
A set of working examples is included in the example directory of this package.
To properly run the software in this package you must have the following packages installed:
We describe a few scenarios for using the Toolkit in specific cases. Read the full documentation in the doc directory for instructions on how to create your own scripts that can re-use the readout functionality available in the kit.
Example 1: Plotting a DET Curve
The following command will plot a single DET curve for a given input score file:
$ plotDET.py test.scores
This command should produce a single plot in PDF file named det.pdf
calculated using the contents of the input score file test.scores. The plot
title will be empty. You can change the output filename and its type (we
.png files or
.jpg) or add a plot title like this:
$ plotDET.py --title="My Test DET" --output=test.png test.scores
You can plot a series of overlayed DET curves in the following manner:
$ plotDET.py --title="My Test DET" --output=overlayed.pdf \ --label=devel development.scores --label=test test.scores
This command will produce a single plot in a PDF file, with the overlayed DET curves generated using each of the score files given as input parameters. A legend will be drawn at a convenient location in the plot using the labels for each of the curves as determined by your input. By default the program generates black-and-white plots, but can be instructed to produce coloured plots using the --colour option (see plotDET.py --help message).
Example 2: Checking score set consistency
You can check the consistency between two (or more) score sets that are supposed to provide scores for multiple biometric modalities using the checkModalities.py script. This tool will compare two input files and will stop on the first error it finds:
$ checkModalities.py faceverif.scores speechverif.scores
If you sort all files before calling the program, huge score files can be checked in a much faster way as we will avoid the sorting step within the program. You can do this using the sort and uniq unix utilities to sort all score files before using checkModalities.py like this:
$ sort my-scores.txt | uniq > sorted-scores.txt $ sort other-scores.txt | uniq > other-sorted-scores.txt $ checkModalities.py --sorted sorted-scores.txt other-sorted-scores.txt
Example 3: Plotting a scores distribution
You can plot joint score distributions including impostors, clients and attacks
plotScores.py script. to do so:
$ plotScores.py --title="My Score Distribution" --output=test.png legit.txt attack.txt
The input is expected to be divided among 2 files that contain the results of the baseline verification evaluation for the legit protocol and for the spoofing attack protocol. The routine will draw 3 histograms. The first 2 correspond to the clients and impostor groups found on the first file. The second histogram corresponds to the attacks found on the second file.