Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.1.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.1-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.1.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.1.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.1.tar.gz
Algorithm Hash digest
SHA256 7840c01327d92e709001a14f2d8af4003e3e583251cc7d209fdadd24f2ecd0cf
MD5 ccc5473f8ae5e18c3c9db37527b333e4
BLAKE2b-256 75e6d6a8820e6fa6cc34281ca3d0348daa52fd81a107c8c1a906f91bb0639ff6

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e199a66732787467e6ea56da969acaf38419a1e4fd2a74fc2c9063b10b0c221c
MD5 3cefbc5430e6bb25b817c47eee73cc8e
BLAKE2b-256 6264caa3f6e7912ffeee53beac25ea8e756c542a4e14de01913691d878960aa6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page