Skip to main content

Extract valuable text information from your documents

Project description

DocTR: Document Text Recognition

License Build Status codecov CodeFactor Codacy Badge Doc Status

Extract valuable information from your documents.

Table of Contents

Getting started

Prerequisites

  • Python 3.6 (or more recent)
  • pip

Installation

Clone the project and install it:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Usage

Python package

You can use the library like any other python package to analyze your documents as follows:

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
json_output = result[0].export()

For an exhaustive list of pretrained models available, please refer to the documentation.

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py --help

Documentation

The full package documentation is available here for detailed specifications. The documentation was built with Sphinx using a theme provided by Read the Docs.

Contributing

Please refer to CONTRIBUTING if you wish to contribute to this project.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-doctr-0.1.0.tar.gz (38.8 kB view hashes)

Uploaded Source

Built Distribution

python_doctr-0.1.0-py3-none-any.whl (45.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page