Skip to main content

CLI for fuzzy matching filenames and copying or moving the best matches.

Project description

fuzzycp: Fuzzy file operations (mv, cp)

fuzzycp performs file operations such as copy and move on files whose filenames match a list. The matching is fuzzy, in other words there is an approximate match.

fuzzycp Demo GIF

The problem of fuzzy matching

In order to understand what that means, here is a concrete example.

Suppose you have a file names.txt containing a list of names you want to match against. Let's say the content of this file is:

1. The Legend of Zelda: Ocarina of Time
2. Super Mario 64
3. Mario Kart 64
4. GoldenEye 007
5. Super Smash Bros.

This is a random example—the top five games released for the Nintendo 64 console. Now, you have a directory with thousands of files, and you want to copy to another directory only the files that are the best-match to the names in the above list. Here are some examples of files in that directory:

'Spider-Man (U) [!].v64'
'StarCraft 64 (U) [!].v64'
'Starfox 64 1.1 (U).v64'  
'Starshot - Space Circus Fever (U) [!].z64'
'Star Wars - Rogue Squadron (U) [!].v64'
'Star Wars - Shadows of the Empire (U) (V1.2) [!].v64'
'Star Wars Episode I - Battle for Naboo (U) [!].v64'
'Star Wars Episode I - Racer (U) [!].v64'
'Stunt Racer 64 (U) [!].z64'
'Super Bowling 64 (U) [!].z64'
'Supercross 2000 (U) [!].z64'
'Superman (U) (M3) [!].z64'
'Super Mario 64 (U) [!].v64'

For the sake of our example, the content of these files is meaningless (let's say they have the metadata for those games). Notice that there will be no exact match between the names in names.txt and the actual filenames. They could have different casing, missing text, extra letters etc. This is where the power of fuzzy matching shines: you don't need an exact match.

The solution

Normally people would do this sort of thing by manually selecting file by file and copying them. Not anymore. Here is how you solve this using fuzzycp. First cd to the directory containing the files.

Copy only the best-matching files to directory dest/directory:

fuzzycp names.txt -c dest/directory

Move only the best-matching files to directory dest/directory:

fuzzycp names.txt -m dest/directory

Print the best-matching files and the matching score:

fuzzycp names.txt 

Print the best-matching files, and the space they occupy:

fuzzycp names.txt -s

Installation

Install from PyPI

pip install fuzzycp

This installs both the fuzzycp package and the fuzzycp command-line entry point. The published package targets Python 3.9+.

You can also run it as a module:

python -m fuzzycp names.txt

Install the standalone binary

curl -fsSL https://raw.githubusercontent.com/rsnemmen/fuzzy_cp/main/install.sh | sh

This downloads the pre-built binary for your platform (macOS arm64/x86_64, Linux x86_64) and installs it to ~/.local/bin/fuzzycp.

How it works

fuzzycp uses the RapidFuzz library—a fast, lightweight C++ library—for fuzzy matching, i.e. measuring how similar two strings (or other sequences) are and finding the best match in a collection.

Internally, fuzzycp compares the names using the WRatio scorer, which internally tries ratio, partial_ratio, token_sort_ratio, and token_set_ratio and picks the highest — making it robust for partial or reordered names. The scorer can easily be swapped in file_matching() for any other RapidFuzz scorer.

Build and release

Build the source distribution and wheel from a clean checkout:

python -m pip install --upgrade build twine
rm -rf build dist src/*.egg-info
python -m build
python -m twine check dist/*

This creates:

  • dist/fuzzycp-<version>.tar.gz
  • dist/fuzzycp-<version>-py3-none-any.whl

To publish to PyPI:

python -m twine upload dist/*

Recommended maintainer checklist:

  1. Bump the version in pyproject.toml.
  2. Build and check the release artifacts with the commands above.
  3. Upload dist/* to PyPI.
  4. Push a version tag (v*) if you also want the GitHub Actions workflow to publish standalone binaries.

TBD

  • Homebrew recipe
  • move files functionality
  • windows standalone EXE
  • add multi-disk mode

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzycp-0.4.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzycp-0.4.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file fuzzycp-0.4.0.tar.gz.

File metadata

  • Download URL: fuzzycp-0.4.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for fuzzycp-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e461763cd4b863eaf1083cd08a39f5ef82ffd789fa38344bc18181c6b1d64f7f
MD5 26f13c07b4c15fe9e6ac0c4436bd3f4d
BLAKE2b-256 3350ba576e1ca0dc0abaa60b311611f85c06edb7849f2f5cd7ef2600a55fec43

See more details on using hashes here.

File details

Details for the file fuzzycp-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: fuzzycp-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for fuzzycp-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efdde12da0d60bc95677bf0aa9fabd93e9f158ba528142b77380460630f43b58
MD5 c67bfa06100d04de0fa1b738e7170635
BLAKE2b-256 445dd811c990a64c60e083c9e3d53c0e5687a0c472051bf31b57226f29d32768

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page