Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.0.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.0-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.0.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.0.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2678f977280d7d016b13545a81d819ea1a201a965f21f8e4145f6451e163ae2e
MD5 81cd690758af19b619349ba41e1f9174
BLAKE2b-256 7a0785b1dc5931905219725919818e625b10f28f9924b449633f7317d5a9b28d

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a4b24ca8666b2dc47da72a7fbd7b4fe1569d5b4e4a149b8436477ea045ff48c
MD5 931be82bf273605b13efa79e3f50d0e3
BLAKE2b-256 6897428e125dc34ebe44e1493d2742d49d622b78c32896ca9d11b140904effdf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page