Papermerge worker - extract OCR text documents
Project description
Papermerge Worker
pmwroker's main job is OCR processing. It extracts text from pdf, tiff, jpeg and png.
Requirements
python >= 3.6
pmworker.wrapper uses subprocess.run method, method added in python 3.5. Also argument of subprocess.run(encoding='utf-8') is used. This argument was added python 3.6
Dependencies
Depends on celery, tesseract, imagemagick.
Usage:
export CELERY_CONFIG_MODULE='pmwroker.config' celery -A pmworker.celery worker -l info
Run Tests
Run all tests:
python3 run.py
Run specific test file:
python3 run.py -p test_endpoint
Which is same as:
python3 run.py -p test_endpoint.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pmworker-1.0.0.tar.gz
(12.5 kB
view hashes)
Built Distribution
pmworker-1.0.0-py3-none-any.whl
(17.7 kB
view hashes)