Skip to main content

Pie Chart Optical Character Recognition

Project description

pie-chart-ocr

A tool to extract tabular data from pie charts, developed as a component of the CryptoSearchTools toolkit.

Note: The original repository was moved to https://git.ehtec.co/research/pie-chart-ocr. https://github.com/ehtec/pie-chart-ocr is a mirror.

Installation

Install via PyPi

You can install all tagged versions of piechartocr from PyPi:

python3 -m pip install --upgrade piechartocr

Note: You cannot run tests and examples from the PyPi installation. The required files need to be downloaded from Gitlab.

Build from source

Install Boost and Tesseract:

sudo apt install libboost-system-dev tesseract-ocr build-essential git

Clone this repository including submodules:

git clone --recursive https://github.com/ehtec/pie-chart-ocr.git
cd pie-chart-ocr

Install Python requirements:

python3 -m pip install -r requirements.txt

Compile libraries:

python3 setup.py build_ext

Create temporary directories:

mkdir temp
mkdir temp1
mkdir temp2

Unpack test charts:

unzip data/charts_steph.zip -d data
unzip data/charts_steph_upsampled.zip -d data
unzip data/generated_pie_charts_legend.zip -d data
unzip data/generated_pie_charts_without_legend.zip -d data

Usage

Run unit tests:

python3 -m nose2 --start-dir tests/ --with-coverage

Run legacy tests / examples:

python3 run_examples.py

Generate test data (mock pie charts):

python3 run_generate_test_data.py

To extract data from any pie chart:

from piechartocr import pie_chart_ocr

# Path to pie chart
path = "/path/to/my/chart.png"

# Extract data
data = pie_chart_ocr.main(path, interactive=False)

# Print the extracted list of tuples of the form [(percentage / 100, label)]
print(data["res"])

Metrics

These metrics are autogenerated by the CI-pipeline.

Metrics for mock pie charts with legend:

chart

Metrics for mock pie charts without legend:

chart

Metrics for real world pie charts (many of them in awful quality, some even unreadable for humans):

chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piechartocr-0.6.6.tar.gz (95.1 MB view details)

Uploaded Source

File details

Details for the file piechartocr-0.6.6.tar.gz.

File metadata

  • Download URL: piechartocr-0.6.6.tar.gz
  • Upload date:
  • Size: 95.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for piechartocr-0.6.6.tar.gz
Algorithm Hash digest
SHA256 dceda30533b8db3c614f44fc9202d571dac305d9c62d683784c1add7f8d4b498
MD5 9b1e24e0c370f1d94fa085229c4999e9
BLAKE2b-256 40b1e68d1b3f89b82f2e1bee5af43c2a8f762ace4b9ef64143f92e0437ded264

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page