Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.3.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.3-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.3.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.3.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.3.tar.gz
Algorithm Hash digest
SHA256 7e9e2a6c792664991f481b921c24d4cd5ce925028ffbb9ddb8e095a28f34e184
MD5 31db26c9ba0caa54422e9d96a8ef8fa8
BLAKE2b-256 1d730678c1c7835d98c725517e983a12a2161b4019704758d5e28b78e44281d7

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1f1a952030efbced89206c5f710515078ff6ab6206cce299852be304bd02af88
MD5 3e849c2e381e4c7c72fe978450446938
BLAKE2b-256 b3cae67502e232cfc3236119e29a8410bb3cc6e94a7fbd7d976876cd6614a734

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page