Skip to main content

Repository for Document AI - server/inference core package

Project description

Deep Doctection Logo

deepdoctection

deepdoctection is the main package for running and training models. It provides the pipeline framework, model wrappers, built-in pipelines, training scripts and evaluation methods.

The base package only installs the necessary dependencies for running inference with some selected models. For training, evaluating as well as running all available models, the full package needs to be installed.

Overview

  • analyzer: Configuration and factory functions for creating document analysis pipelines and the built-in analyzer.
  • configs: YAML configuration for pipelines and model profiles for the model catalogue.
  • extern: External model wrappers (Detectron2, DocTr, HuggingFace Transformers, Tesseract, PdfPlumber, etc.)
  • pipe: Pipeline components and services.
  • eval: Evaluation metrics and Evaluator.
  • train: Training utilities and training scripts for Detectron2 and selected Transformer models.

Installation

Basic Installation

For inference use cases, install the base package:

uv pip install deepdoctection

Important: Various dependencies must be installed separately:

  • PyTorch: Follow instructions at https://pytorch.org/get-started/locally/ according to your os and hardware.
  • Transformers: pip install transformers>=4.48.0 (if using HF models)
  • Timm: pip install timm>=0.9.16 (necessary for if using some dedicated HF models)
  • DocTr: pip install python-doctr>=1.0.0 (if using DocTr models)
  • Detectron2: Follow instructions at https://detectron2.readthedocs.io/en/latest/tutorials/install.html
  • PDFPlumber: pip install pdfplumber>=0.11.0
  • JDeskew: pip install jdeskew>=0.2.2
  • Boto3: pip install boto3==1.34.102

For running evaluation with various metrics you can also install in then use:

  • APTED: pip install apted==1.0.3
  • Distance: pip install distance==0.1.3
  • Pycocotools: pip install pycocotools>=2.0.2

Image processing is supported by PIL or OpenCV. PIL is used by default and will always be installed. If you prefer to use OpenCV, you can install it:

  • OpenCV: pip install opencv-python==4.8.0.76

Full Installation (Training & Evaluation)

For a one large install with all dependencies (except PyTorch), run:

uv pip install deepdoctection[full]

Development Installation

For development purpose use clone the repository and install in editable mode.

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepdoctection-1.2.9.tar.gz (150.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepdoctection-1.2.9-py3-none-any.whl (191.8 kB view details)

Uploaded Python 3

File details

Details for the file deepdoctection-1.2.9.tar.gz.

File metadata

  • Download URL: deepdoctection-1.2.9.tar.gz
  • Upload date:
  • Size: 150.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for deepdoctection-1.2.9.tar.gz
Algorithm Hash digest
SHA256 5ddab7f876bda3f4cf2b4b467cf4f6d5e6e6493151803b288aef25ed99d4d13d
MD5 5c537a7ba8f6a3cf0f2a81ff8851e62b
BLAKE2b-256 75859e2307042e78b8433d72be2fc3a25f098f1e8e33eee5a1b46520dcfe049f

See more details on using hashes here.

File details

Details for the file deepdoctection-1.2.9-py3-none-any.whl.

File metadata

  • Download URL: deepdoctection-1.2.9-py3-none-any.whl
  • Upload date:
  • Size: 191.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for deepdoctection-1.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0b10e6a964053c424607d534b4d1927526744548afb5727c406e157ef4db60a8
MD5 5349ae5b053ba101608cf36df8ad3c49
BLAKE2b-256 4c9bf5b72a8d0d40ce275f5ab2ea7784eb163ab8cd8df55aabdc8ff6eeb1cb54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page