Pie Chart Optical Character Recognition
Project description
pie-chart-ocr
A tool to extract tabular data from pie charts, developed as a component of the CryptoSearchTools toolkit.
Note: The original repository was moved to https://git.ehtec.co/research/pie-chart-ocr. https://github.com/ehtec/pie-chart-ocr is a mirror.
Installation
Install via PyPi
You can install all tagged versions of piechartocr
from PyPi:
python3 -m pip install --upgrade piechartocr
Note: You cannot run tests and examples from the PyPi installation. The required files need to be downloaded from Gitlab.
Install from source
Install Boost and Tesseract:
sudo apt install libboost-system-dev tesseract-ocr build-essential git
Clone this repository including submodules:
git clone --recursive https://github.com/ehtec/pie-chart-ocr.git
cd pie-chart-ocr
Install Python requirements:
python3 -m pip install -r requirements.txt
Compile libraries:
python3 setup.py build_ext
Create temporary directories:
mkdir temp
mkdir temp1
mkdir temp2
Unpack test charts:
unzip data/charts_steph.zip -d data
unzip data/charts_steph_upsampled.zip -d data
unzip data/generated_pie_charts_legend.zip -d data
unzip data/generated_pie_charts_without_legend.zip -d data
Usage
Run unit tests:
python3 -m nose2 --start-dir tests/ --with-coverage
Run legacy tests / examples:
python3 run_examples.py
Generate test data (mock pie charts):
python3 run_generate_test_data
Metrics
These metrics are autogenerated by the CI-pipeline.
Metrics for mock pie charts with legend:
Metrics for mock pie charts without legend:
Metrics for real world pie charts (many of them in awful quality, some even unreadable for humans):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.