Skip to main content

Python Optical Character Recognition based libray for text extraction of scanned PDF

Project description

NoelOCR is a Python Library for extracting text from scanned PDF and full text PDF developed by Noel Moses Mwadende.

How it works ?

processPDF module from NoelOCR takes scanned PDF process it and output searchabel/plain text.

For whom it was developed ?

It was developed for Machine Learning engineer who deals with PDF. It might be classification of scanned PDF, text extraction from scanned PDF or any task which requires feature extraction from scanned PDF. Not only that, NoelOCR is very flexible as it can also extract text from full text PDF. That means it works for both, full text PDF and scanned PDF though it was purposefuly created for scanned PDF.

How to use it ?

import NoelOCR as nm

text = nm.processPDF(‘moses.pdf’)

print(text)

It works for Linux operating system. The module for Windows will be added in Beta version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NoelOCR-0.0.8.tar.gz (2.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page