Extract valuable text information from your documents
Project description
DocTR: Document Text Recognition
Extract valuable information from your documents.
Table of Contents
Getting started
Prerequisites
- Python 3.6 (or more recent)
- pip
Installation
Clone the project and install it:
git clone https://github.com/mindee/doctr.git
pip install -e doctr/.
Usage
Python package
You can use the library like any other python package to analyze your documents as follows:
from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn
model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
json_output = result[0].export()
For an exhaustive list of pretrained models available, please refer to the documentation.
Docker container
If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
docker build . -t <YOUR_IMAGE_TAG>
Example script
An example script is provided for a simple documentation analysis of a PDF file:
python scripts/analyze.py path/to/your/doc.pdf
All script arguments can be checked using python scripts/analyze.py --help
Documentation
The full package documentation is available here for detailed specifications. The documentation was built with Sphinx using a theme provided by Read the Docs.
Contributing
Please refer to CONTRIBUTING
if you wish to contribute to this project.
License
Distributed under the Apache 2.0 License. See LICENSE
for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for python_doctr-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 278a3da1ab8752eabe3a11056bd2a589bbe5ecd46d586fa83e6bf6d69259a39f |
|
MD5 | 3a8db0d93d0193f01b6a7ded869874f7 |
|
BLAKE2b-256 | 36e106c7ad1b07c123156a764a6a2360b4ebef2784a55d7dfc4feb9799bff2b9 |