Skip to main content

Extract valuable text information from your documents

Project description

DocTR: Document Text Recognition

License Build Status codecov CodeFactor Codacy Badge Doc Status

Extract valuable information from your documents.

Table of Contents

Getting started

Prerequisites

  • Python 3.6 (or more recent)
  • pip

Installation

Clone the project and install it:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Usage

Python package

You can use the library like any other python package to analyze your documents as follows:

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
json_output = result[0].export()

For an exhaustive list of pretrained models available, please refer to the documentation.

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py --help

Documentation

The full package documentation is available here for detailed specifications. The documentation was built with Sphinx using a theme provided by Read the Docs.

Contributing

Please refer to CONTRIBUTING if you wish to contribute to this project.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-doctr-0.1.0.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_doctr-0.1.0-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file python-doctr-0.1.0.tar.gz.

File metadata

  • Download URL: python-doctr-0.1.0.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/53.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for python-doctr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4c63a7b38e0a3379357a4b614e267fe7c3c995ac6c5e1f10ba91f1f8d5125a3d
MD5 4c6cbc61ebd6db076f1e6c7fe880142a
BLAKE2b-256 41a2728bc8c84c76f070a32233a0af5e9ab43526b994a5ad6dbd22b095d38e6b

See more details on using hashes here.

File details

Details for the file python_doctr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: python_doctr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/53.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for python_doctr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 278a3da1ab8752eabe3a11056bd2a589bbe5ecd46d586fa83e6bf6d69259a39f
MD5 3a8db0d93d0193f01b6a7ded869874f7
BLAKE2b-256 36e106c7ad1b07c123156a764a6a2360b4ebef2784a55d7dfc4feb9799bff2b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page