Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.2.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.2-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.2.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.2.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.2.tar.gz
Algorithm Hash digest
SHA256 24510691168f67dbd5ae22b0bc385d2bceec0c76abb2397818a768183ac1f9fa
MD5 916206302d2754e822af50db3133204e
BLAKE2b-256 6bc79240731e4f4a84a8b5766071d75def436b7b8ae1fc9450f4d5d96bb4d82f

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 91ee7f8aae7c9e3a4481d2b09dbd551d2279470ce204392bb740e8402c4aa4f8
MD5 e5a799e10f27c6899a5cf3031df647cb
BLAKE2b-256 38cd014dd6ea44a4f4c10eb7f6fbdca162c55d80906b674e3ee24f59f5430c95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page