Extract valuable text information from your documents

These details have not been verified by PyPI

Project links

Project description

DocTR: Document Text Recognition

Build Status

Extract valuable information from your documents.

Getting Started
- Prerequisites
- Installation
Usage
Documentation
Contributing
License

Getting started

Prerequisites

Python 3.6 (or more recent)
pip

Installation

Clone the project and install it:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Usage

Python package

You can use the library like any other python package to analyze your documents as follows:

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
json_output = result[0].export()

For an exhaustive list of pretrained models available, please refer to the documentation.

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py --help

Documentation

The full package documentation is available here for detailed specifications. The documentation was built with Sphinx using a theme provided by Read the Docs.

Contributing

Please refer to CONTRIBUTING if you wish to contribute to this project.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.0

Aug 8, 2024

0.8.1

Mar 4, 2024

0.8.0

Feb 28, 2024

0.7.0

Sep 9, 2023

0.6.0

Sep 29, 2022

0.5.1

Mar 22, 2022

0.5.0

Dec 31, 2021

0.4.1

Nov 22, 2021

0.4.0

Oct 1, 2021

0.3.1

Aug 27, 2021

0.3.0

Jul 2, 2021

0.2.1

May 28, 2021

0.2.0

May 11, 2021

0.1.1

Mar 18, 2021

This version

0.1.0

Mar 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-doctr-0.1.0.tar.gz (38.8 kB view hashes)

Uploaded Mar 7, 2021 Source

Built Distribution

python_doctr-0.1.0-py3-none-any.whl (45.2 kB view hashes)

Uploaded Mar 7, 2021 Python 3

Hashes for python-doctr-0.1.0.tar.gz

Hashes for python-doctr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4c63a7b38e0a3379357a4b614e267fe7c3c995ac6c5e1f10ba91f1f8d5125a3d`
MD5	`4c6cbc61ebd6db076f1e6c7fe880142a`
BLAKE2b-256	`41a2728bc8c84c76f070a32233a0af5e9ab43526b994a5ad6dbd22b095d38e6b`

Hashes for python_doctr-0.1.0-py3-none-any.whl

Hashes for python_doctr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`278a3da1ab8752eabe3a11056bd2a589bbe5ecd46d586fa83e6bf6d69259a39f`
MD5	`3a8db0d93d0193f01b6a7ded869874f7`
BLAKE2b-256	`36e106c7ad1b07c123156a764a6a2360b4ebef2784a55d7dfc4feb9799bff2b9`