Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.4.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.4-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.4.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.4.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.4.tar.gz
Algorithm Hash digest
SHA256 407c697c5442967b152428a906c11d0b64145a935b07d43e3f8bf917ce373af1
MD5 f998178942fa2ea09b377e9415dbc24b
BLAKE2b-256 379df3644b9502cf88b97f7c101197571d20590d1fdfe811db99ef035efb35e3

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 0af35a0d886787a74d529e2f6b24d35844b49b40c3a846a8e1ef17f75f3deb4c
MD5 81b1676479f75a7ce5e1ccaefbdcd5a3
BLAKE2b-256 de5db6969205ae235810897d4335e226b7c4b3a0de8ea68bc249fac8123bb585

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page