Skip to main content

A tool to classify images

Project description

DocumentClassifier is a Python library that provides functionality for classifying documents based on images and text content.

This library is designed to help you process and organize large sets of documents, making it useful for various applications such as image-based document classification and clustering.

Usage

Prepare a folder of image documents

import DocumentsClassifier as DC

# declare folder path
images_path = 'path/to/your/documents/folder'

# using function to classify
DC.classify(images_path)
 >> Clusterd successfully

After running the code, your images will be classified into subfolders in the root you declared:

Some limitations

This package need to load some pretrained on Huggingface:

  1. SentenceTransformer: Model for text embeddings.
  2. Classify Model: Our pretrained Beit for classifying images.
  3. Image Extractor: Our pretrained Beit for extracting features.

And we also use PaddleOCR for extracting texts so It maybe slow for the first time because It has to download pretrained.

We are in developing process so thanks for your patience.

##Updates Version 0.0.2: Change text embedding method to SentenceTransformer

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact & Contributing

If you have any questions or suggestions, please contact us at hungdtse171849@fpt.edu.vn or phuongtnse161960@fpt.edu.vn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Documents-Classifier-0.0.3.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Documents_Classifier-0.0.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file Documents-Classifier-0.0.3.tar.gz.

File metadata

  • Download URL: Documents-Classifier-0.0.3.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for Documents-Classifier-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3016f0cabfc881f37b366ddaa970107a6d607b36a0ab14dbc89dc4efe05fba3f
MD5 ba06b84a505ed9e1bbd0967204102cc6
BLAKE2b-256 a7fdf39c8a22116c17ba2a1c4bd674cc44e8fd480fbb59df7d47671471587469

See more details on using hashes here.

File details

Details for the file Documents_Classifier-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for Documents_Classifier-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1d257aca6d27898763600c68f124e5372f20de489da36cee8e51ae8fe9381bab
MD5 9193b4ba429e6da1d4a56e050b5bbe8c
BLAKE2b-256 8ac118cfab00aadea595db3e3dead8493b4018fef856528b72ab864c839b37cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page