Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.0.7.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.0.7-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.0.7.tar.gz.

File metadata

  • Download URL: dd_datasets-1.0.7.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.7.tar.gz
Algorithm Hash digest
SHA256 07a16b96e5ea7f118cce1432a1f7722c1fed7f808a5b107591ae1de365c7eacc
MD5 b91bd3b1f906a4b4eec1787a8bae5c68
BLAKE2b-256 9d51701073c57be9cda2692b892d75db37b6b9e14cd8c06d98bb445f5dc1e45f

See more details on using hashes here.

File details

Details for the file dd_datasets-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dd_datasets-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5ccb6d46cabd2bad722f610a1f5266be8f57adcc7fe462334ef347822d71eb62
MD5 8dccc338209a0ce0f70934342b2ab6cc
BLAKE2b-256 d9995056b5f13b1dbd7a92d5c3a8abec610f097ac4a2860a888139a2a316dd29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page