A tool to classify images
Project description
DocumentClassifier is a Python library that provides functionality for classifying documents based on images and text content.
This library is designed to help you process and organize large sets of documents, making it useful for various applications such as image-based document classification and clustering.
Usage
Prepare a folder of image documents
import DocumentsClassifier as DC
# declare folder path
images_path = 'path/to/your/documents/folder'
# using function to classify
DC.classify(images_path)
>> Clusterd successfully
After running the code, your images will be classified into subfolders in the root you declared:
Some limitations
This package need to load some pretrained on Huggingface:
- SentenceTransformer: Model for text embeddings.
- Classify Model: Our pretrained Beit for classifying images.
- Image Extractor: Our pretrained Beit for extracting features.
And we also use PaddleOCR for extracting texts so It maybe slow for the first time because It has to download pretrained.
We are in developing process so thanks for your patience.
##Updates Version 0.0.2: Change text embedding method to SentenceTransformer
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact & Contributing
If you have any questions or suggestions, please contact us at hungdtse171849@fpt.edu.vn or phuongtnse161960@fpt.edu.vn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file Documents-Classifier-0.0.3.tar.gz.
File metadata
- Download URL: Documents-Classifier-0.0.3.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3016f0cabfc881f37b366ddaa970107a6d607b36a0ab14dbc89dc4efe05fba3f
|
|
| MD5 |
ba06b84a505ed9e1bbd0967204102cc6
|
|
| BLAKE2b-256 |
a7fdf39c8a22116c17ba2a1c4bd674cc44e8fd480fbb59df7d47671471587469
|
File details
Details for the file Documents_Classifier-0.0.3-py3-none-any.whl.
File metadata
- Download URL: Documents_Classifier-0.0.3-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d257aca6d27898763600c68f124e5372f20de489da36cee8e51ae8fe9381bab
|
|
| MD5 |
9193b4ba429e6da1d4a56e050b5bbe8c
|
|
| BLAKE2b-256 |
8ac118cfab00aadea595db3e3dead8493b4018fef856528b72ab864c839b37cd
|