Skip to main content

Fast and fully local NLP file organizer that organizes files based on their content.

Project description

Connor

Connor is a file organizer written in python. It makes use of the sentence-transformers framework for the main organization process and the PyQt6 GUI toolkit for the graphical user interface. It is by no means supposed to substitute for organzing files by hand. It is just a concept. Connor features a fast and fully local file organizer that uses natural language processing to organize computer files based on their textual content.

releases issues-open stars

Features

Connor works locally on your computer using a pre-trained NLP model sentence-transformers/paraphrase-MiniLM-L6-v2 to understand the meaning of the data and calculate the cosine similarity between files. The folders are appropriately named using topic modeling through the Latent Dirichlet Allocation (LDA) technique.

The file names and contents are read, then cosine similarity is used to calculate the similarity between the content of every file with respect to every other file. Based on the condition that the similarity scores between the files are above the provided threshold, the files are grouped in key-value pairs into a dictionary where each category corresponds to a folder.

Latent Dirichlet Allocation is then used to generate topic names for the contents in each folder, i.e., the categories in the dictionary. Folders are created using the most relevant topic names, and the corresponding files are then moved into their appropriate folders.

For files such as images (image support will be added later), executables, binaries, etc. that cannot be read are organized into a _misc folder based on their file extensions.


File Organization Summary

  1. Organize files within a selected folder or manually uploaded files (uploading files is only supported for GUI).
  2. Organize text-based files (.docx, .txt, .pdf, etc.) using NLP.
  3. Creates a separate folder named "Miscellaneous" for dissimilar or unprocessable files based on extension.
  4. Provide a summary (tree structure) of the organization process upon completion.

Customization Options

  1. Similarity Threshold: Allows you to choose a similarity percentage threshold for grouping similar files.
  2. Reading Word Limit: You can set a limit on the number of words to read from the file content.
  3. Folder Name Word Limit: You can specify the maximum number of words allowed in the created folder names.
  4. Default Parameters: You can modify these three parameters and save them for future sessions.

User Preferences

Command Line Interface: Simple and concise command line interface to quickly organize folders. Graphical User Interface: Provides a simplistic and straightforward GUI for ease of use with upload files feature.



Installation

There are installation instructions for both GUI and CLI. You can choose the one you want to install. If you're opting for building the application from source then adding the run file to path is recommended.

Install Connor via pip:

  1. Make sure you have python and pip installed and added to path.
  2. Run pip install connor-nlp

Install the GUI version of Connor (executable)

  1. Go to the latest release.
  2. Follow the steps there.
  3. Run the executable (.exe).


Usage

Command Structure

connor [command] [options]

Commands

run: Run the folder organization process.

Usage:

connor run <folder_path>

Options:

  • folder_path: Required. Absolute path to the folder that you want to organize.

Example:

connor run /path/to/your/folder

settings: Update the default settings for the tool.

Usage:

connor settings [options]

Options:

  • -f, --folder-word-limit: Set the maximum length for folder names. (default: 3)
  • -r, --reading-limit: Specify the word limit for reading files. (default: 200)
  • -t, --similarity-threshold: Define the similarity threshold percentage. (default: 50)
  • --show: Show current settings

Example:

connor settings -f 2 -r 150 -t 60
$ connor settings --show
To see how to update: Connor settings [-h]

Current settings:
  folder words limit     3
  reading limit          200
  similarity threshold   50%

--gui: Run Connor as a full fledged GUI from the terminal.

Usage:

connor --gui

Help

To view help information for commands and options use the -h or --help flag.

Example:

$ connor -h
usage: Connor [-h] [--gui] {settings,run} ...

Connor: Fast and local NLP file organizer

positional arguments:
  {settings,run}
    settings      Update the settings for the organizer
    run           Run the folder organization process

options:
  -h, --help      show this help message and exit
  --gui           Run the application in GUI mode.


Source

1. Clone repository:

git clone https://github.com/ycatsh/connor.git
cd connor

2. Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate

3. Install dependencies:

pip3 install -r requirements.txt

4. Run program:

For GUI:

python3 run.py --gui

For CLI:

python3 run.py -h

5. Install locally (optional):

pip3 install .

Example:

connor --gui
connor -h


License

This project is distributed under MIT License, which can be found in LICENSE in the root dir of the project. I reserve the right to place future versions of this project under a different license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connor_nlp-1.0.0.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

connor_nlp-1.0.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file connor_nlp-1.0.0.tar.gz.

File metadata

  • Download URL: connor_nlp-1.0.0.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for connor_nlp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7f7c5836ddd592329da94675ab90fb691538fe6e32c84db4e18c77142ff867fe
MD5 93914e90aa91656820e6263a828001aa
BLAKE2b-256 1b83bf7f55ee3941a0f3455ee7268bb61d048a14f5c6fe452b146921c020ea04

See more details on using hashes here.

File details

Details for the file connor_nlp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: connor_nlp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for connor_nlp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11cbdfd5dd0896984bd9df612441a847d616e7213b78682e75bcf3aa9cd0ef13
MD5 acb879421ffa2ccb8829a5511b49f5a2
BLAKE2b-256 9d8d4d67a3587cc6f4024eb75ce6160fcec3ee61c1d3e69841544279cd5fd4b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page