Documents-Classifier

A tool to classify images

These details have not been verified by PyPI

Project description

DocumentClassifier is a Python library that provides functionality for classifying documents based on images and text content.

This library is designed to help you process and organize large sets of documents, making it useful for various applications such as image-based document classification and clustering.

Usage

Prepare a folder of image documents

import DocumentsClassifier as DC

# declare folder path
images_path = 'path/to/your/documents/folder'

# using function to classify
DC.classify(images_path)
 >> Clusterd successfully

After running the code, your images will be classified into subfolders in the root you declared:

Some limitations

This package need to load some pretrained on Huggingface:

Defsent: Model for words embeddings.
Classify Model: Our pretrained Beit for classifying images.
Image Extractor: Our pretrained Beit for extracting features.

And we also use PaddleOCR for extracting texts so It maybe slow for the first time because It has to download pretrained.

We are in developing process so thanks for your patience.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact & Contributing

If you have any questions or suggestions, please contact us at hungdtse171849@fpt.edu.vn or phuongtnse161960@fpt.edu.vn.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.3

Nov 10, 2023

0.0.2

Oct 20, 2023

This version

0.0.1

Oct 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Documents-Classifier-0.0.1.tar.gz (9.0 kB view hashes)

Uploaded Oct 19, 2023 Source

Built Distribution

Documents_Classifier-0.0.1-py3-none-any.whl (8.8 kB view hashes)

Uploaded Oct 19, 2023 Python 3

Hashes for Documents-Classifier-0.0.1.tar.gz

Hashes for Documents-Classifier-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2661ccc3b73b05a04acb45e61834bd52e4d60e5461bafebb80a6f7e76fbda78e`
MD5	`95e03d35c8dc80a2d7873890e7a2e715`
BLAKE2b-256	`ef7b5a4ac81517953d3e14a88d3bbaa5a5f6f6b55dfe73299453d95989d79e70`

Hashes for Documents_Classifier-0.0.1-py3-none-any.whl

Hashes for Documents_Classifier-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`817d322d0848b9f54449da8679b3b0ecf41ea6d98b7e46995d690fb8bcf986f4`
MD5	`a3cabd4d24f6cda57ac3d411f309115a`
BLAKE2b-256	`827f3b158365ddf9316e4e3f5c5e11eb5803809489804bdf32e0b1a6fbea1482`