Skip to main content

Papermerge worker - extract OCR text documents

Project description

Papermerge Worker

pmwroker's main job is OCR processing. It extracts text from pdf, tiff, jpeg and png.

Requirements

python >= 3.6

pmworker.wrapper uses subprocess.run method, method added in python 3.5. Also argument of subprocess.run(encoding='utf-8') is used. This argument was added python 3.6

Dependencies

Depends on celery, tesseract, imagemagick.

Usage:

export CELERY_CONFIG_MODULE='pmwroker.config' celery -A pmworker.celery worker -l info

Run Tests

Run all tests:

python3 run.py

Run specific test file:

python3 run.py -p test_endpoint

Which is same as:

python3 run.py -p test_endpoint.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmworker-1.0.0.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

pmworker-1.0.0-py3-none-any.whl (17.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page