Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.0.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.0-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.0.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.0.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.0.tar.gz
Algorithm Hash digest
SHA256 12e368e0710a4c4adfed63953ae0bf9a94bc104e517b6b8cb7fc5ccce0ac19d9
MD5 a7393cbd64a861bc28010a8d1e3cc86c
BLAKE2b-256 6176334a840dbf251bf970ab3860c6577af9cc249f755a39b8a5a28a9b97004d

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a584689c765da5ab32b3774e519fd1f6921562a9b136ba6188aed2a930ffc4a1
MD5 ada881ab4fa5c1fbf5a66090e154c439
BLAKE2b-256 18ef05abfe4e90492a9ae3aadbc8052e1fd9b158f49d196356e0b87c1ca8b9b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page