Skip to main content

Awesome document classifcation - Implementation of major techniques

Project description

Document Classification: All in one place

This package provides support to classify documents using all the popular avialable methods. Along with document classification, it also provides support to a single interface for OCR using both open source models like: Tesseract and PaddleOCR, and commercial models like Google OCR, etc.

PYPI: document-classification

Features

  • OCR
    • Tesseract
    • Google OCR
  • Classification
    • Fasttext (train, evaluate, predict)
    • Language Models like BERT (train, evaluate, predict)
    • Language + Layout Models like LayoutLM (train, evaluate, predict)
    • LLM (evaluate, predict)

Installation

Install with a single command:

pip install -U document-classification

or if you use poetry (like me):

poetry add document-classification

Usuage

Please check the examples directory for examples on how to use the package.

Contributing

Your contributions are welcome! If you have great examples or find neat patterns, clone the repo and add another example. The goal is to find great patterns and cool examples to highlight.

If you encounter any issues or want to provide feedback, you can create an issue in this repository. You can also reach out to me on Twitter at @amittimalsina14.

Check the todo.md file for the list of features that are coming next with their due dates.

What's coming next?

I am going to first add tests and refactor the code to make it more readable, usuable, and maintainable. Then I will release documentation and more examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_classification-0.0.1a0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_classification-0.0.1a0-py3-none-any.whl (65.7 kB view details)

Uploaded Python 3

File details

Details for the file document_classification-0.0.1a0.tar.gz.

File metadata

  • Download URL: document_classification-0.0.1a0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.10.15 Darwin/24.0.0

File hashes

Hashes for document_classification-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 c27a1ec3113a5cf81f43943e23a6ddd24339958fccb0112edc83bc51f42efa2c
MD5 95d8b6fa60a35378d1da1482d75a03aa
BLAKE2b-256 37b9c5149327e2dda7ef711924a807db820d6cc24fffdacd56ce6d082c3c476e

See more details on using hashes here.

File details

Details for the file document_classification-0.0.1a0-py3-none-any.whl.

File metadata

File hashes

Hashes for document_classification-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e449c2f3bdc826619b339199ab5b014886eb4a620193cc38c35018a420df4f1
MD5 24892371a1f5ad2daca57b6e547ccfa9
BLAKE2b-256 7f4584a2a120d60ef589a674c97420695c80da25eac0ae593c1cfcdb00116c5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page