Skip to main content

Papermerge worker - extract OCR text documents

Project description

Papermerge Worker

pmwroker's main job is OCR processing. It extracts text from pdf, tiff, jpeg and png. For full project description please see Papermerge Project

Requirements

python >= 3.6

pmworker.wrapper uses subprocess.run method, method added in python 3.5. Also argument of subprocess.run(encoding='utf-8') is used. This argument was added python 3.6

Dependencies

Depends on celery, tesseract, imagemagick.

Usage:

export CELERY_CONFIG_MODULE='pmwroker.config' celery -A pmworker.celery worker -l info

Run Tests

Run all tests:

python3 run.py

Run specific test file:

python3 run.py -p test_endpoint

Which is same as:

python3 run.py -p test_endpoint.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmworker-1.2.0.tar.gz (18.6 kB view details)

Uploaded Source

File details

Details for the file pmworker-1.2.0.tar.gz.

File metadata

  • Download URL: pmworker-1.2.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for pmworker-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2e5321c771f7d7fff407327e4a31cc82812a0d054b3d5d01b0f460b8b63e635b
MD5 919c60f5ad81052af60bb8a3d1706646
BLAKE2b-256 d2227918dbbc00b7d5f6c44a88f5fddb75649eeae232d76e0abff9bb07be3e99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page