Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.7.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.7-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.7.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.7.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.7.tar.gz
Algorithm Hash digest
SHA256 2fbc4c0810b19dcd5bd5535e2ceede1dc48fa04aa2a24971ea7f9980afb8fe36
MD5 e2c3d6cd5652885926a9eb061d928739
BLAKE2b-256 c560ce69dbb9715045c35f6ec24e8b3d3fa6834404c1429f8141e9984d5a578e

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.7-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.7-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 318044dab2ec17da85bc2f6de32a2c18b60e98f67f061a01893045c8cb4e33f2
MD5 a833f0412e0eb1607cad169185edc5ac
BLAKE2b-256 09335fb5ffda91e159f6ad33a5898695688b927c1eba316acf21803fdf6675f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page