Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.6.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.6-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.6.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.6.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.6.tar.gz
Algorithm Hash digest
SHA256 5b2446374670a07a3d3437e1a770d40c14fcf06607ef99dd518e12e6c8433344
MD5 54428d061b212e2339d576333abb150e
BLAKE2b-256 1837926ced0282a224ecfdbc106754952573966cd36bbfa88600e623cc90a7a5

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1b4db983ecab74ee9c5e2e8fdab7aa769dc8b49efff0fd44ea684aa07b4cca02
MD5 97616761531d0bd410aa6467a4a66c41
BLAKE2b-256 7d21c0851aaa7998c62a22b7a71892a01637bf78776db5f28fff8e1162c92f24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page