Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.2.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.2-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.2.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.2.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6297144b9bcf95889c5eb81d771b2f6dee848151e4520f9625288c63c72f8446
MD5 9b2f60ae8a5ecde9df1bf5171dbccf7b
BLAKE2b-256 cf2491459651a8b23abdc582f1591c299e155f9f77c83b861e8fd21bbe10c2ad

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e7d6cdfef82765caa2645f0576aebbe1872818f70bb1a91f1c54359969a63fe5
MD5 ffa078b6276842d322e12ee5feea388f
BLAKE2b-256 cd2f5ed469f8ad4f812f309e218918c6b763ae1a958b27f81bca5ba5af2dfc56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page