Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.1.0.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.1.0-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.1.0.tar.gz.

File metadata

  • Download URL: dd_datasets-1.1.0.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.1.0.tar.gz
Algorithm Hash digest
SHA256 631e940b225780dff125a16f197030b3b05fad650d2b375363f92e726d08798e
MD5 c2eb58818ac9ef2864a2582c5abc25a2
BLAKE2b-256 e7d6f0b7eb408ec7baddf6e95ce6598b3a1da6b8905cbf9f99887ce660182159

See more details on using hashes here.

File details

Details for the file dd_datasets-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99f6e4b8c1c3bf991ae846da0815fc19c351e4b500ad23028b2f895a44cc5ac6
MD5 ef8801a65c23ab9117e18f327ccf4075
BLAKE2b-256 d0a00712a5f3fd038e56eefa7a0dd6988011af88601daa03aae73a56bf82e9da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page