Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.8.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.8-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.8.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.8.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.8.tar.gz
Algorithm Hash digest
SHA256 c7f9aa27f7cff96e30a2328bf18332ac29a60ef4fd25687fb03381a2e12b8048
MD5 d61bede0457744a85a6a7f87d82f8bf6
BLAKE2b-256 85d1a0673a9aa9b74883900697a365126ce942815dc484af4c04bb663f937a26

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.8-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.8-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f06e646cbf3b5ccae4657d7b5bf4f99d3b8449895f6a21707c13a8b344510088
MD5 b3b94fd3e61169116a1cf11ec4f6c4ca
BLAKE2b-256 d8c810234e1d81756ee00a61577a9d6bfd251eb771c6aa54a67f5c9dbc54c1c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page