Fast and fully local NLP file organizer that organizes files based on their content.
Project description
Connor is a file organizer written in python. It makes use of the sentence-transformers framework for the main organization process. It features a fast and fully local file organizer that uses natural language processing to organize computer files based on their textual content.
Installation
Before installing Connor, check that Python and pip are installed on your computer:
python --version
If a message dispalying python's verson appears, it means that Python is correctly installed. If an error message appears then go to the Python website to download it.
After installing python you can use pip to install connor, type the following command:
pip install connor-nlp
If something doesnt work or you are running into problems, head to the official GitHub repository for detailed instructions or open an issue.
Features
Connor works locally on your computer, using a pre-trained NLP model, sentence-transformers/paraphrase-MiniLM-L6-v2
, to understand the meaning of the data and calculate cosine similarity between files. The files are organized into groups, and the corresponding folders are appropriately named using topic modeling through the Latent Dirichlet Allocation (LDA) technique. Subsequently, the files are moved to their respective folders.
File Organization Summary
- Organize files within a selected folder or manually uploaded files (uploading files is only supported for GUI).
- Organize text-based files (
.docx
,.txt
,.pdf
, etc.) using NLP. - Creates a separate folder named "Miscellaneous" for dissimilar or unprocessable files based on extension.
- Provide a summary (tree structure) of the organization process upon completion.
Customization Options
- Similarity Threshold: Allows you to choose a similarity percentage threshold for grouping similar files.
- Reading Word Limit: You can set a limit on the number of words to read from the file content.
- Folder Name Word Limit: You can specify the maximum number of words allowed in the created folder names.
- Default Parameters: You can modify these three parameters and save them for future sessions.
Building From Source
It is useful if you want to use features that are currently in development. To build Connor locally from source read the instructions here.
Dependencies
docx | >=0.2.4 |
---|---|
nltk | >=3.9.1 |
numpy | >=2.1.1 |
odfpy | >=1.4.1 |
openpyxl | >=3.1.5 |
PyPDF2 | >=3.0.1 |
scikit_learn | >=1.5.2 |
sentence_transformers | >=3.1.1 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for connor_nlp-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88fc38d99b104ff45a741d23118af51b148587f3020db1df9bad2d7cfd7673f5 |
|
MD5 | cc3b8c6354b5095abee3411439867429 |
|
BLAKE2b-256 | 9c080709a45cdbf1002d276d3ecfe3a3a063f3808b657def6f3f70b63d17ad53 |