Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.4.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.4-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.4.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.4.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.4.tar.gz
Algorithm Hash digest
SHA256 726e7d2ff9f0b3d18b627da185cd7e0cb218444ea6b9b5ae35896b7aaba4b1e0
MD5 f377cc5a41906327959286ba1c419dc0
BLAKE2b-256 2d6cb25a633f26589769ec35d8b12d24372d934cf2f65d5d49e16565134f64ca

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 736b4d3256ba37776a6dfc511cf0edf379c657192c6e82822026697b0f49c3d2
MD5 a7841aa2a8f0cf3308eb24c9db62cdfa
BLAKE2b-256 cde99853ae57c466356fa536fbb42b0295c1ac8751be07d162a5031874b06448

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page