Skip to main content

Extract valuable text information from your documents

Project description

DocTR: Document Text Recognition

License Build Status codecov CodeFactor Codacy Badge Doc Status Pypi

Extract valuable information from your documents.

Table of Contents

Getting started

Prerequisites

  • Python 3.6 (or higher)
  • pip

Installation

You can install the latest release of the package using pypi as follows:

pip install python-doctr

Or you can install it from source:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Usage

Python package

You can use the library like any other python package to analyze your documents as follows:

from doctr.documents import read_pdf, read_img
from doctr.models import ocr_db_crnn_vgg

model = ocr_db_crnn_vgg(pretrained=True)
# PDF
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
# Image
page = read_img("path/to/your/img.jpg")
result = model([[page]])
# Export
json_output = result[0].export()

For an exhaustive list of pretrained models available, please refer to the documentation.

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py --help

Demo app

A minimal demo app is provided for you to play with the text detection model!

You will need an extra dependency (Streamlit) for the app to run:

pip install -r demo/requirements.txt

You can then easily run your app in your default browser by running:

streamlit run demo/app.py

Documentation

The full package documentation is available here for detailed specifications. The documentation was built with Sphinx using a theme provided by Read the Docs.

Contributing

Please refer to CONTRIBUTING if you wish to contribute to this project.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-doctr-0.1.1.tar.gz (47.1 kB view hashes)

Uploaded Source

Built Distribution

python_doctr-0.1.1-py3-none-any.whl (53.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page