Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.6.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.6-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.6.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.6.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.6.tar.gz
Algorithm Hash digest
SHA256 50aa95b621d4432cb671ad68c3b077e8a72cd0244fb6643a69ae68bde196678d
MD5 70af8c9e6ff2e47a19001c16744ee2ba
BLAKE2b-256 fce2ec0f432bedf83394f48df76d591942441bb38d688b221d210cd7bd793ad2

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0c5773c00355d941e9762a35c0fe932edcfbd38e201484ae7329039960c5ceef
MD5 1f54f865b8a18ca784ce66d4d404b67c
BLAKE2b-256 cb05f829178840ee71060789d8eefb7797859d7886ea5ff088a80182ab2cdc7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page