Skip to main content

Pie Chart Optical Character Recognition

Project description

pie-chart-ocr

A tool to extract tabular data from pie charts, developed as a component of the CryptoSearchTools toolkit.

Note: The original repository was moved to https://git.ehtec.co/research/pie-chart-ocr. https://github.com/ehtec/pie-chart-ocr is a mirror.

Installation

Install via PyPi

You can install all tagged versions of piechartocr from PyPi:

python3 -m pip install --upgrade piechartocr

Note: You cannot run tests and examples from the PyPi installation. The required files need to be downloaded from Gitlab.

Build from source

Install Boost and Tesseract:

sudo apt install libboost-system-dev tesseract-ocr build-essential git

Clone this repository including submodules:

git clone --recursive https://github.com/ehtec/pie-chart-ocr.git
cd pie-chart-ocr

Install Python requirements:

python3 -m pip install -r requirements.txt

Compile libraries:

python3 setup.py build_ext

Create temporary directories:

mkdir temp
mkdir temp1
mkdir temp2

Unpack test charts:

unzip data/charts_steph.zip -d data
unzip data/charts_steph_upsampled.zip -d data
unzip data/generated_pie_charts_legend.zip -d data
unzip data/generated_pie_charts_without_legend.zip -d data

Usage

Run unit tests:

python3 -m nose2 --start-dir tests/ --with-coverage

Run legacy tests / examples:

python3 run_examples.py

Generate test data (mock pie charts):

python3 run_generate_test_data.py

To extract data from any pie chart:

from piechartocr import pie_chart_ocr

# Path to pie chart
path = "/path/to/my/chart.png"

# Extract data
data = pie_chart_ocr.main(path, interactive=False)

# Print the extracted list of tuples of the form [(percentage / 100, label)]
print(data["res"])

Metrics

These metrics are autogenerated by the CI-pipeline.

Metrics for mock pie charts with legend:

chart

Metrics for mock pie charts without legend:

chart

Metrics for real world pie charts (many of them in awful quality, some even unreadable for humans):

chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piechartocr-0.6.6.tar.gz (95.1 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page