Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.5.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.5-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.5.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.5.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.5.tar.gz
Algorithm Hash digest
SHA256 020a6af6a8cc1319f5bc6b08862c76568a5b0bb29d6ed9c86f1ad7561a39babb
MD5 26239516289e8161a2d193c48ab34500
BLAKE2b-256 1c46bb04aadc945a89da28bdeecfedd76baa412565cb5b7e25b54c0aedf2e946

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f216717584fe69a994116e1475efeb2b24516e98f509024f1073d35464a5d473
MD5 2b435ac5526f00c13c2b1448fe1ac588
BLAKE2b-256 0fc8ab56c41d782539c0e2c59a76025c79dbc31e17cd3599afcb3f7c02a420e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page