Pie Chart Optical Character Recognition
Project description
pie-chart-ocr
A tool to extract tabular data from pie charts, developed as a component of the CryptoSearchTools toolkit.
Note: The original repository was moved to https://git.ehtec.co/research/pie-chart-ocr. https://github.com/ehtec/pie-chart-ocr is a mirror.
Installation
Install via PyPi
You can install all tagged versions of piechartocr
from PyPi:
python3 -m pip install --upgrade piechartocr
Note: You cannot run tests and examples from the PyPi installation. The required files need to be downloaded from Gitlab.
Build from source
Install Boost and Tesseract:
sudo apt install libboost-system-dev tesseract-ocr build-essential git
Clone this repository including submodules:
git clone --recursive https://github.com/ehtec/pie-chart-ocr.git
cd pie-chart-ocr
Install Python requirements:
python3 -m pip install -r requirements.txt
Compile libraries:
python3 setup.py build_ext
Create temporary directories:
mkdir temp
mkdir temp1
mkdir temp2
Unpack test charts:
unzip data/charts_steph.zip -d data
unzip data/charts_steph_upsampled.zip -d data
unzip data/generated_pie_charts_legend.zip -d data
unzip data/generated_pie_charts_without_legend.zip -d data
Usage
Run unit tests:
python3 -m nose2 --start-dir tests/ --with-coverage
Run legacy tests / examples:
python3 run_examples.py
Generate test data (mock pie charts):
python3 run_generate_test_data.py
To extract data from any pie chart:
from piechartocr import pie_chart_ocr
# Path to pie chart
path = "/path/to/my/chart.png"
# Extract data
data = pie_chart_ocr.main(path, interactive=False)
# Print the extracted list of tuples of the form [(percentage / 100, label)]
print(data["res"])
Metrics
These metrics are autogenerated by the CI-pipeline.
Metrics for mock pie charts with legend:
Metrics for mock pie charts without legend:
Metrics for real world pie charts (many of them in awful quality, some even unreadable for humans):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file piechartocr-0.6.6.tar.gz
.
File metadata
- Download URL: piechartocr-0.6.6.tar.gz
- Upload date:
- Size: 95.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dceda30533b8db3c614f44fc9202d571dac305d9c62d683784c1add7f8d4b498 |
|
MD5 | 9b1e24e0c370f1d94fa085229c4999e9 |
|
BLAKE2b-256 | 40b1e68d1b3f89b82f2e1bee5af43c2a8f762ace4b9ef64143f92e0437ded264 |