Skip to main content

Dataset building and processing tools for deepdoctection

Project description

Deep Doctection Logo

deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

Overview

dd-datasets is a package that provides comprehensive dataset management capabilities for Document AI tasks.

It includes:

  • datasets: Built-in dataset definitions and dataflow builders for popular document understanding datasets.
  • instances: Pre-defined dataset instances for common document understanding tasks such as object detection, text classifications and named entity recognition.

Installation

uv pip install dd-datasets

For using all datasets including those that require the xml-parsing tool lxml:

uv pip install dd-datasets[full]

License

Apache License 2.0

Author

Dr. Janis Meyer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_datasets-1.2.9.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_datasets-1.2.9-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file dd_datasets-1.2.9.tar.gz.

File metadata

  • Download URL: dd_datasets-1.2.9.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.9.tar.gz
Algorithm Hash digest
SHA256 0878a8d6fc751fad7878a665c13ea9a097fe34edf54ccb067ec95625639ec3a8
MD5 ba234635afdd99b6620f05d40671d5ca
BLAKE2b-256 a539c3dc236700fc7b75c9d68307daa895c61a27a642bf95730abd05513dacb9

See more details on using hashes here.

File details

Details for the file dd_datasets-1.2.9-py3-none-any.whl.

File metadata

  • Download URL: dd_datasets-1.2.9-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dd_datasets-1.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c0ea1f4ba78879cdd91a88d18691a7887a8f8929ac5763743d63446d25633f64
MD5 bf4492669d7220d3600ebd76e5c1966f
BLAKE2b-256 f785b9cb4fade42e154993f065a814f76d1d39727b7fa022be306efdf5072fd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page