Skip to main content

🔧 Tools to automate your document understanding tasks.

Project description

Document Tools

pypi python Build Status codecov

🔧 Tools to automate your document understanding tasks.

This package contains tools to automate your document understanding tasks by leveraging the power of 🤗 Datasets and 🤗 Transformers.

With this package, you can (or will be able to):

  • 🚧 Create a dataset from a collection of documents.
  • Transform a dataset to a format that is suitable for training a model.
  • 🚧 Train a model on a dataset.
  • 🚧 Evaluate the performance of a model on a dataset of documents.
  • 🚧 Export a model to a format that is suitable for inference.

Features

This project is under development and is in the alpha stage. It is not ready for production use, and if you find any bugs or have any suggestions, please let us know by opening an issue or a pull request.

Featured models

Usage

One-liner to get started:

from datasets import load_dataset
from document_tools import tokenize_dataset

# Load a dataset from 🤗 Hub
dataset = load_dataset("deeptools-ai/test-document-invoice", split="train")

# Tokenize the dataset
tokenized_dataset = tokenize_dataset(dataset, target_model="layoutlmv3")

For more information, please see the documentation

Credits

This package was created with Cookiecutter and the waynerv/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document-tools-0.1.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

document_tools-0.1.1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file document-tools-0.1.1.tar.gz.

File metadata

  • Download URL: document-tools-0.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for document-tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 27f53828b354e30416399854727bccc111439e7e7459c9a2bd0643ff5c77655a
MD5 124e030c4d1fbd35ac501aa6cb398e7e
BLAKE2b-256 4227012258b444e3ada191149d503e62745de5a02c96686066121a62dd0e8ae7

See more details on using hashes here.

File details

Details for the file document_tools-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for document_tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b9d3844cb89e1c1122542fb78b24033b1d50608ef62502644bb4105e0b49528c
MD5 fc2ee1afa0094734eb265b09a9b4a059
BLAKE2b-256 7c6c68ada32ea0761e8c231d349c71b23fc063f48aa76d98ff370495ef7a1c00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page