Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.5.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.5-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.5.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.5.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.5.tar.gz
Algorithm Hash digest
SHA256 3a7b0db0d9c9605fcd4d0890c97e27895ea4c484c6faf78390f876b023fc9831
MD5 cee24b7aeff5c4d3c10787d72d999316
BLAKE2b-256 61743fcad2e0e9b06eb13857b4f8bf1b58468215d390e5cc66ddffcfb1000b84

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 eb22d5f37fbc1cc8bc356b14e4e788266ef24290c26083329b8e6391ccf59492
MD5 501bf81a2a0d24215d6cf883f784f03c
BLAKE2b-256 d0cc3263d978a1f28545d6c4ba553a306b7fb42880e02ffa214ebfa0a8edc29f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page