Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.1.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.1-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.1.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.1.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.1.tar.gz
Algorithm Hash digest
SHA256 2c680a5bf0e3f19dccd40a2e636801ef956ec3f12d1096b4ee9f82a98b7675a2
MD5 e3f43ca114bfe70fe57b296009b12080
BLAKE2b-256 75619c23f2ca92febd3a2275924d45b6022017b0b09e73496f48da5f25efff77

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 340673f8b4fc6ff7a187847a93017c8473cb5f16cc84150594fa8f5b3f821903
MD5 4bc7b4de6ffb5bd169658340d39d479c
BLAKE2b-256 fa2089756019f46e2285588aa110e2be0ac28482ab740e56c8619bff070e8285

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page