Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

(uv) pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

(uv) pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.0a0.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.0a0-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.0a0.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.0a0.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.0a0.tar.gz
Algorithm Hash digest
SHA256 14f05d52a084eaefb16c9bf22de84b8fd296694581560f2faa9c90461198cc3b
MD5 470201fc8ef62d6baeb0b7319c2787d7
BLAKE2b-256 6f10478444160c99ccede1223eba2da43b7e737aee6348875f3cc4bb285b07be

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.0a0-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.0a0-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 99c511ce11660911dcf8239fcaa3606edbfdab3cc88465887325a7d7d2240ee1
MD5 34aea92bef251610387b475c0fd2f130
BLAKE2b-256 d6235a5bc155deb38186f19836756705df759eb0d060454b3cbb116323a3a722

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page