Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.10.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.10-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.10.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.10.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.10.tar.gz
Algorithm Hash digest
SHA256 3a772a6355987d64bb0878174fd359630a2178389222935f865bda24b7839773
MD5 3b73f9bc8047d4ee4b5c660ff36b13dc
BLAKE2b-256 00ae1ffa17e30d4390fa32f14dce24e7a09d4366f7e466f18085d2e1ec04b6e9

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.10-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.10-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 6bf6c53fed921627cf6aafa74c711874548fa70e1ed54d9dd299bff6fd24f290
MD5 5d698bf0b312207c479f3f631956babf
BLAKE2b-256 957da10b59e5a1fd8b8a23d87926b85fab7f10b74d7bdb520ac1271dbfabda06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page