CLI for fuzzy matching filenames and copying or moving the best matches.
Project description
fuzzycp: Fuzzy file operations (mv, cp)
fuzzycp performs file operations such as copy and move on files whose filenames match a list. The matching is fuzzy, in other words there is an approximate match.
The problem of fuzzy matching
In order to understand what that means, here is a concrete example.
Suppose you have a file names.txt containing a list of names you want to match against. Let's say the content of this file is:
1. The Legend of Zelda: Ocarina of Time
2. Super Mario 64
3. Mario Kart 64
4. GoldenEye 007
5. Super Smash Bros.
This is a random example—the top five games released for the Nintendo 64 console. Now, you have a directory with thousands of files, and you want to copy to another directory only the files that are the best-match to the names in the above list. Here are some examples of files in that directory:
'Spider-Man (U) [!].v64'
'StarCraft 64 (U) [!].v64'
'Starfox 64 1.1 (U).v64'
'Starshot - Space Circus Fever (U) [!].z64'
'Star Wars - Rogue Squadron (U) [!].v64'
'Star Wars - Shadows of the Empire (U) (V1.2) [!].v64'
'Star Wars Episode I - Battle for Naboo (U) [!].v64'
'Star Wars Episode I - Racer (U) [!].v64'
'Stunt Racer 64 (U) [!].z64'
'Super Bowling 64 (U) [!].z64'
'Supercross 2000 (U) [!].z64'
'Superman (U) (M3) [!].z64'
'Super Mario 64 (U) [!].v64'
For the sake of our example, the content of these files is meaningless (let's say they have the metadata for those games). Notice that there will be no exact match between the names in names.txt and the actual filenames. They could have different casing, missing text, extra letters etc. This is where the power of fuzzy matching shines: you don't need an exact match.
The solution
Normally people would do this sort of thing by manually selecting file by file and copying them. Not anymore. Here is how you solve this using fuzzycp. First cd to the directory containing the files.
Copy only the best-matching files to directory dest/directory:
fuzzycp names.txt -c dest/directory
Move only the best-matching files to directory dest/directory:
fuzzycp names.txt -m dest/directory
Print the best-matching files and the matching score:
fuzzycp names.txt
Print the best-matching files, and the space they occupy:
fuzzycp names.txt -s
Installation
Install from PyPI
pip install fuzzycp
This installs both the fuzzycp package and the fuzzycp command-line entry point.
The published package targets Python 3.9+.
You can also run it as a module:
python -m fuzzycp names.txt
Install the standalone binary
curl -fsSL https://raw.githubusercontent.com/rsnemmen/fuzzy_cp/main/install.sh | sh
This downloads the pre-built binary for your platform (macOS arm64/x86_64, Linux x86_64) and installs it to ~/.local/bin/fuzzycp.
How it works
fuzzycp uses the RapidFuzz library—a fast, lightweight C++ library—for fuzzy matching, i.e. measuring how similar two strings (or other sequences) are and finding the best match in a collection.
Internally, fuzzycp compares the names using the WRatio scorer, which internally tries ratio, partial_ratio, token_sort_ratio, and token_set_ratio and picks the highest — making it robust for partial or reordered names. The scorer can easily be swapped in file_matching() for any other RapidFuzz scorer.
Build and release
Build the source distribution and wheel from a clean checkout:
python -m pip install --upgrade build twine
rm -rf build dist src/*.egg-info
python -m build
python -m twine check dist/*
This creates:
dist/fuzzycp-<version>.tar.gzdist/fuzzycp-<version>-py3-none-any.whl
To publish to PyPI:
python -m twine upload dist/*
Recommended maintainer checklist:
- Bump the version in
pyproject.toml. - Build and check the release artifacts with the commands above.
- Upload
dist/*to PyPI. - Push a version tag (
v*) if you also want the GitHub Actions workflow to publish standalone binaries.
TBD
- Homebrew recipe
- move files functionality
- windows standalone EXE
- add multi-disk mode
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzycp-0.4.0.tar.gz.
File metadata
- Download URL: fuzzycp-0.4.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e461763cd4b863eaf1083cd08a39f5ef82ffd789fa38344bc18181c6b1d64f7f
|
|
| MD5 |
26f13c07b4c15fe9e6ac0c4436bd3f4d
|
|
| BLAKE2b-256 |
3350ba576e1ca0dc0abaa60b311611f85c06edb7849f2f5cd7ef2600a55fec43
|
File details
Details for the file fuzzycp-0.4.0-py3-none-any.whl.
File metadata
- Download URL: fuzzycp-0.4.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efdde12da0d60bc95677bf0aa9fabd93e9f158ba528142b77380460630f43b58
|
|
| MD5 |
c67bfa06100d04de0fa1b738e7170635
|
|
| BLAKE2b-256 |
445dd811c990a64c60e083c9e3d53c0e5687a0c472051bf31b57226f29d32768
|