A tool to classify images
Project description
DocumentClassifier is a Python library that provides functionality for classifying documents based on images and text content.
This library is designed to help you process and organize large sets of documents, making it useful for various applications such as image-based document classification and clustering.
Usage
Prepare a folder of image documents
import DocumentsClassifier as DC
# declare folder path
images_path = 'path/to/your/documents/folder'
# using function to classify
DC.classify(images_path)
>> Clusterd successfully
After running the code, your images will be classified into subfolders in the root you declared:
Some limitations
This package need to load some pretrained on Huggingface:
- SentenceTransformer: Model for text embeddings.
- Classify Model: Our pretrained Beit for classifying images.
- Image Extractor: Our pretrained Beit for extracting features.
And we also use PaddleOCR for extracting texts so It maybe slow for the first time because It has to download pretrained.
We are in developing process so thanks for your patience.
##Updates Version 0.0.2: Change text embedding method to SentenceTransformer
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact & Contributing
If you have any questions or suggestions, please contact us at hungdtse171849@fpt.edu.vn or phuongtnse161960@fpt.edu.vn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for Documents-Classifier-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3016f0cabfc881f37b366ddaa970107a6d607b36a0ab14dbc89dc4efe05fba3f |
|
MD5 | ba06b84a505ed9e1bbd0967204102cc6 |
|
BLAKE2b-256 | a7fdf39c8a22116c17ba2a1c4bd674cc44e8fd480fbb59df7d47671471587469 |
Hashes for Documents_Classifier-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d257aca6d27898763600c68f124e5372f20de489da36cee8e51ae8fe9381bab |
|
MD5 | 9193b4ba429e6da1d4a56e050b5bbe8c |
|
BLAKE2b-256 | 8ac118cfab00aadea595db3e3dead8493b4018fef856528b72ab864c839b37cd |