The evaluation component of the sci-annot framework
Project description
Sci-Annot Evaluation Component
This package was developed as part of my master's thesis and used in the evaluation stage.
Its main purpose is to produce per-page confusion matrices with multiple classes for predictions in the field of Page Object Detection, with inter-object dependencies also supported. To be more precise, it was used to compare predictions in the task of figure, table and caption extraction, but the project can somewhat easily be extended to other object types.
Features
This tool currently supports the following commands:
rasterize
- Rasterize all pdfs in input folder and additionally produce a summary parquet file called render_summary.parquet in the output folder.split-pdffigures2
- Take original pdffigures2 output and split it into validator-friendly per-page files.benchmark
- Evaluate predictions against a ground truth and produce TP, FP, and FN metrics for each page.deepfigures-predict
- Use deepfigures to detect elements from each pdf in the input folder.transpile
- Take a folder of predictions in one format and output them in another.
Currently, the following prediction formats are supported:
- Sci-Annot - The corresponding annotation front-end.
- PDFFigures 2.0
- DeepFigures
Consider contributing a parser/exporter for your system of choice!
How the Validation Works
The comparison of two sets of bounding boxes is modelled as an optimal assignment problem, with the cost function being the distance between the centres of bounding boxes. The matching algorithm runs inside each class (Figues, Tables, Captions) individually, and uses the Intersection over Union (IoU) to decide if two bounding boxes match. This means that if two bounding boxes look the same, but have different classes, no True Positives will be counted towards either of those classes. This is in contrast to some other validation schemes which award partial points in such cases.
The reference validation runs for all referenced classes at the same time (Figures and Tables in our case), and does not take the bounding boxes' shape or class into account, only if its reference matches the closest bounding box in the corresponding prediction set. For more information on how this works, refer to the thesis which spawned this project.
Installation & Usage
This tool is packaged under the name sci-annot-eval.
You can install it like pip install sci-annot-eval
, or conda install sci-annot-eval
.
Once installed, call the package from your cli sci-annot-eval COMMAND
, or use it as a library in your python project.
Development Setup
If you wish to work on this project locally, you'll need:
- python3.9+
- pipenv
To set up the dependencies, just run pipenv install
in the project root.
From that point on, you can do pipenv shell
, which will launch your custom python environment with all of the dependencies installed.
When developing, you can call python3 cli.py
in the project root to execute the local version of sci-annot-eval, instead of the installed one.
TODO
- Fix logging
- Add more tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file sci_annot_eval-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: sci_annot_eval-0.0.9-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b72289ad270145944a63d9cdc3e64ca4617dc1f47dbc4e6d7ee307ae4186cb8b |
|
MD5 | c2ba8e495a29ffff2f764225bb7b242a |
|
BLAKE2b-256 | 47eb87a35af784349b37a820e3b2d1fd699b0281a5e7b6df7cc187078b891af8 |