Skip to main content

CLI to automatically normalize and organize your files based on customizable rules.

Project description

PyPI version PyPI - Python Version Tests codecov

neatfile

CLI to normalize and organize your files based on customizable rules.

Why build this?

I have filesystem OCD. Maybe you share my annoyance at having files with non-normalized filenames sent from coworkers, friends, and family. On any given day, I receive dozens of files via Slack, email, and other messaging apps sent by people who have their own way of naming files. For example:

  • department 2023 financials and budget 08232002.xlsx
  • some contract Jan7 reviewed NOT FINAL (NL comments) v13.docx
  • John&Jane-meeting-notes 4 3 25.txt
  • Project_mockups(WIP)___sep92022.pdf
  • FIRSTNAMElastname Resume (#1) [companyname].PDF

What's the problem here?

  • No self-evident way to organize them into folders
  • No common patterns to search for
  • Dates all over the place or nonexistent
  • Inconsistent casing and word separators
  • Special characters within text
  • I could go on and on...

neatfile is created to solve for these problems by providing an easy CLI to rename and organize files into directories based on your preferences.

Features

Filename cleaning and normalization

  • Remove special characters
  • Trim multiple separators (word----word becomes word-word)
  • Normalize filenamesto lowercase, uppercase, Sentence case, or Title Case
  • Normalize all files to a common word separator (_, -, , .)
  • Enforce lowercase file extensions
  • Remove common English stopwords
  • Split camelCase words into separate words (camel Case)

Date parsing

  • Identify dates in filenames in many different formats and and normalize them into a preferred format
  • Add the date to the beginning or the end of the filename (or remove it entirely)
  • Fall back to file creation date if no date is found in the filename

File organization

  • Define projects with directory trees in the config file
  • Match terms in filenames to folder names and move files into matching folder
  • Use vector matching to find similar terms
  • Respect the Johnny Decimal system, if you use it
  • Optionally, add .neatfile files to directories containing a list of words that will match files

Installation

# With uv
uv tool install neatfile

# With pip
python -m pip install --user neatfile

Note: neatfile relies on a ~35mb language model to provide vector matching used in identifying similarities between filenames and directory names. This file will only be downloaded once when you install neatfile.

Quickstart

neatfile has four subcommands:

  • clean - Clean and normalize filenames
  • config - View the user configuration file (or create one)
  • sort - Move files into a directory tree
  • process - Clean AND sort files
  • tree - Print a tree representation of a project's directory structure

To see the help text for a subcommand, run neatfile <subcommand> --help.

Example usage

Copy the default configuration file into place for you to edit:

$ neatfile config --create
✅ Success: User config file created: ~/.config/neatfile/config.toml

Clean all text files in a directory

$ neatfile clean *.txt
✅ Success: CamelCase_with_underscore_separators.txt -> 2025-04-16 Camel Case Underscore Separators.txt
✅ Success: datestamped sept 04 2023 (signed).txt -> 2023-09-04 Datestamped Signed.txt
✅ Success: removing.special.characters.$#@.08-05-2024.txt -> 2024-08-05 Removing Special Characters.txt

Sort a file into a specific directory within a project (without cleaning the filename)

$ neatfile sort --project=work 20230904_datestamped_signed.txt
✅ Success: 20230904_datestamped_signed.txt -> ~/work/administrative/legal/20230904_datestamped_signed.txt

Process a file to clean and sort it

$ neatfile process --project=work --date-format=%Y-%m-%d datestamped_20230904_signed.txt
✅ Success: datestamped_20230904_signed.txt -> ~/work/administrative/legal/2023-09-04 Datestamped Signed.txt

Configuration

Define personalized defaults in a configuration file and apply them consistently across all runs.

To create a configuration file. Run neatfile config --create to create a configuration file at ~/.config/neatfile/config.toml or your $XDG_CONFIG_HOME/neatfile/config.toml if set.

Preferences can also be set on a per project basis within the configuration file.

Values are set in the following order of precedence:

  1. CLI arguments
  2. Project specific settings within the configuration file
  3. Default values in the user configuration file
  4. Default values as specified below in the sample configuration file.

Sample configuration file

# Global settings
# Override on a per project basis in the [projects] section if needed.

# Ambiguous date formats can be specified by region
# Useful when searching a filename for a date and the date format is ambiguous such as 030425
# The US would see this as March 4, 2025, the EU would see this as April 3, 2025, and Japan would see this as 25th April, 2003
# options: "US", "EU", "JP"
date_region        = "US"

# date format
# If specified, the date will be added to the filename following this format.
# See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes for details on how to specify a format.
date_format        = ""

# Ignores dotfiles (files that start with a period) when cleaning a directory.
# true or false
ignore_dotfiles    = true

# File names matching this regex will be ignored
ignore_file_regex  = ''

# List of file names to ignore
# Useful if there are consistently recurring files that you don't want to clean.
ignored_files      = []

# Where to insert the date.
# "before" or "after"
insert_location    = "before"

# Force the casing of certain words.
# Useful for acronyms or proper nouns such as 'iMac', 'CEO', or 'John'
match_case_list    = []

# Overwrite existing files. true or false.
# If false, unique integers will be appended to the filename.
overwrite_existing = false

# Separator to use between words.
# Options: "ignore", "underscore", "space", "dash".
# "ignore" does it's best to keep the original separator.
separator          = "ignore"

# Split CamelCase words into separate words.
# true or false
split_words        = false

# List of specific stopwords to be stripped from filenames in
# addition to the default English stopwords
stopwords          = []

# Strip stopwords from filenames.
# true or false
strip_stopwords    = true

# Transform the case of the filename.
# Options: "ignore", "lower", "upper", "title", "sentence"
transform_case     = "ignore"

# Override the global settings for specific projects and tell neatfile
# how to organize files into a directory tree.
[projects]
    [projects.project_name]
        # The name of the project is used as a command line option. (e.g. --project=project_name)
        name = ""

        # The path to the project's directory
        path = ""

        # The type of project.
        # Options: "jd" for Johnny Decimal, "folder" for a folder structure
        type = "folder"

        # The depth of folders to index beneath the project's root path
        depth = 2

        # Default configuration values specified above can be overridden here on a per project basis

Directory Matching

neatfile uses smart matching to determine which directory a file belongs in:

  1. Word Extraction: Words are extracted from your filename and compared with directory names in your project structure.
  2. Intelligent Matching: The system uses both exact matches and vector similarity to find the best directory. For example, a file containing "budget" might match with a "Finance" directory through vector similarity.
  3. Hierarchical Navigation: neatfile considers your project's directory tree (up to the configured depth) when finding directories to match files to.

Customizing Match Behavior

You can influence how files match to directories in two ways:

  1. Using --term Flags: Specify additional matching terms when running a command:

    neatfile sort --project=work --term=legal contract.pdf
    

    This tells neatfile to include directories that match "legal" when sorting the file even though it doesn't contain the term "legal" in the filename.

  2. Creating .neatfile Files: Add a .neatfile text file to any directory containing additional terms that should match to that location:

    # /path/to/work/admin/legal/.neatfile
    contract
    agreement
    nda
    

    Now any file containing these terms will preferentially match to the legal directory.

Example Scenario

With a project structure like:

work/
├── admin/
│   ├── hr/
│      └── .neatfile # Contains the term "handbook"   └── legal/
├── finance/
│   ├── budgets/
│   └── invoices/
└── marketing/
    ├── campaigns/
    └── social-media/

Configured in config.toml:

[projects.work]
name = "work"
path = "/path/to/work"
depth = 2

neatfile would automatically:

  • Move 2023_employee_handbook.pdf to the hr directory
  • Sort Q2_budget_forecast.xlsx into the budgets directory
  • Place new_campaign_assets.zip in the campaigns directory

Vector matching

Behind the scenes, neatfile uses a language model to find similar terms in a filename and a directory name by comparing their vector embeddings. This is useful for matching terms that are similar but not exactly the same. For example, this will match the term mockups with the directory name mockup or the term budget with the term finance.

Caveats

neatfile is built for my own personal use. While this cli is thoroughly tested, I make no warranties for any data loss or other issuesthat may result from use. I strongly recommend running in --dry-run mode prior to committing changes.

Contributing

See CONTRIBUTING.md for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neatfile-3.0.0.tar.gz (168.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neatfile-3.0.0-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file neatfile-3.0.0.tar.gz.

File metadata

  • Download URL: neatfile-3.0.0.tar.gz
  • Upload date:
  • Size: 168.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for neatfile-3.0.0.tar.gz
Algorithm Hash digest
SHA256 64905606700561ba5124e1dc181c554428c270e641cfe104d3bf30a8f3df02f9
MD5 1dd8c963ee26a82ef8ee0d052d509217
BLAKE2b-256 8022840553e1a2a8bf7f88f699b5db0b46dd503c0ff1c3c8dd4a8491e10ee98f

See more details on using hashes here.

File details

Details for the file neatfile-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: neatfile-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for neatfile-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3616344dd264c8b8b7eb7d94460174501aed4c6ab77f9133cd8d46ec5cbe15e
MD5 95bb4d45ec4cba21b14f741e964aa9c6
BLAKE2b-256 a74de8acfc745a8bf261e601dd50a5f42f6b38e8423f21be53d6fe6235dad4a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page