Dir sync by file content with hard link support, plus fast fdupes, and more.
lnsync provides sync-by-content of local directories (including hard link syncing), plus other related features.
The main feature is (partial) one-way sync of local directories by only renaming, and linking/delinking on the target directory, without copying or deleting any file content data from source.
This alows quick replication onto the target of arbitrary source renaming/linking/delinking (but not deleting or creating files), and may be used as a preprocessing step for a full sync tool (such as rsync).
Other features include finding duplicate files, checking for file content changes, and listing all hard links to a file.
As with rsync, files may be included or excluded using glob patterns.
File Trees: Online and Offline
The file structure under a directory (or online tree) can be saved to a combined hash/metadata one-file database, comprising: the directory structure, file and dir names, file sizes, and mod dates. These offline trees may be used in most lnsync commands in place of a local directory, e.g. as the source to reorganize a target directory according to a certain pattern.
Via a config file, specfic options may be applied to online trees matching a glob pattern.
Files are identified by their content hash, using either 32-bit or 64-bit xxHash (a fast, non-cryptographic hash function), or other functions (see Hashing Functions below).
Hash values are stored in local databases, on a single file per tree, by default with a name matching
lnsync-[0-9]+.db. Only one such file should exist at each location. At the time of creation, the numeric part is chosen randomly, to avoid overwritting by accident when doing a full sync. Files matching
lnsync-[0-9]+.db as well as the hash database in use are ignored by lnsync operations.
File modification times are used to detect stale hash values. Modification times are not synced to the target.
lnsync64 selects the 32-bit and the 64-bit version of the xxHash, respectively, while keeping
lnsync-[0-9]+.db as the file hash database location. Otherwise the two commands work the same.
lnsync is equivalent to
--xhash=XXHASH64 select the hashing function, but also change the file hash database location to
Custom hashing functions are supported. To set a custom hasher:
--hasher=<EXECUTABLE>, where <EXECUTABLE> is a path to an executable that takes as single argument a file path and prints out (in decimal) a 64-bit unsigned integer hash value. The file hash location is set to
lnsync-nopreset requires the hashing function must be explicitly selected using one of the preceding switches.
Files, File Paths, and Hard Links
On most current file systems, the same file may be reached via multiple file paths, also called aliases, or hard links. If there is a single hard link to a file, removing that link deletes the file.
lnsync oprerates on files, not file paths. E.g., if a file has two hard links, it does not count as a duplicate.
Ignored File System Objects
Files which cannot be read are skipped. File ownership and permissions are otherwise ignored.
Symbolic links and any other file system objects are ignored.
When syncing, directories are created as needed on the target, and target directories left empty and not on the source are removed.
Install the latest version from the PyPI repository with
pip install --user -U lnsync or else clone the repo with
git clone https://github.com/mrsimoes/lnsync.git and then run
python setup.py install.
Alternatives for Linux
Some of the many tools for syncing with rename detection:
There are patches for rsync (see --detect-renamed) that provide renaming on the target, relying on file size and modification time, for matching files in nearby directories. rsync can preserve hard links and sync with remote rsync instances.
rclone provides sync-by-rename for local as well as an amazing array of remote clients. It allows caching file hashes, after some configuration. However, it does not preserve hard links.
git itself identifies and stores files by content, and has been adapted for syncing.
Support in modern file systems (e.g. btrfs) for snapshots may be adapted for syncing.
In addition to syncing, lnsync allows using the file hash database to search for files according to a variety of criteria.
If your photos are at
/home/you/Photos and its backup is at
lnsync sync /home/you/Photos /mnt/disk/Photos will sync the target. For a dry run, use the
After syncing, two database files are created, one at the source
/home/you/Photos and another at the target
/mnt/disk/Photos. File hashes are computed, as needed, and stored in those files. The database filenames include a random suffix, to help avoid accidental overwriting when syncing with a tool other than lsync.
To obtain an rsync command that will complete syncing, skipping
lnsync database files, run
lnsync rsync /home/you/Photos /mnt/disk/Photos. If the
rsync options provided are suitable, run the command again with the
-x switch to execute. Alternatively, run
lnsync syncr /home/you/Photos /mnt/disk/Photos to complete those two steps in one go.
Finally, to compare source and target, run
lnsync cmp /home/you/Photos /mnt/disk/Photos.
To find duplicate files, run
lnsync fdupes /home/you/Photos. Use
-z to compare by size only.
-H to treat hard links to the same file as distinct. If this option is not given, for each multiple-linked with other duplicates, a path is arbitrarily picked and printed.
To find all files in Photos which are not in the backup (under any name):
lnsync onfirstonly /home/you/Photos /mnt/disk/Photos. To find all files with jpg extension,
lnsync search /home/you/Photos "*.jpg".
To have any operation on a subdir of
/home/you/Photos use the hash database at
root=/home/you/Photos under section
/home/you/Photos/** of your config file. (See Configuration Files below.)
For example, to sync the subdir
/home/you/Photos/Best to another target, using the hash database at
lnsync sync /home/you/Photos/Best --root=/home/you/Photos /mnt/eframe/.
Custom Hash Functions
As an example of a custom hashing function, if
hashmp3.sh computes the hash of only the sound content of an mp3 (and not any included metadata), then the following
lnsync-mp3.cfg config file may be used to find duplicate mp3 content:
[DEFAULT] dbprefix = lnsync-mp3 hasher = ~/bin/hashmp3.sh [**] only_include = *.mp3
The second section applies the
--only-include="*.mp3" to ALL online tree locations.
Invoke this mode with
lnsync --config <PATH TO lnsync-mp3.cfg> <COMMAND> [<ARG> ...].
All lnsync commands are
lnsync [<global-options>] <command> [<cmd-options>] [<cmd-parameters>].
Specifying the Database Location
By default, the database file corresponding to an online file tree is the unique file located in that directory and with basename matching
To specify another prefix for all following online file trees,
To specify a different database directory possibly for all online file trees where to look for the database file,
--dbrootdir DBDIR. Each online file tree corresponding to a subdir of DBDIR will use the database file at DBDIR.
To specify a directory containing database root directories to be used for any contained onlien tree,
--dbrootmounr DBMOUNTSLOCATION. This is useful e.g. for removable media, which are all mounted at
--dbrootmounr /mnt/user will use
/mnt/user/somedrive for the online tree at
To specify the database file for the following online file tree,
sync [options] <source> <target>syncs a target dir from a source dir (or offline tree).
First, target files are matched to source files. Each matched target file is associated to a single source file. If either file system supports hard links, a file may have multiple pathnames. lnsync will not complain if the match is not unique or some files are not matched on either source and/or target.
For each matched target file, its pathnames are made to match those of the corresponding source file, by renaming, deleting, or creating hard links. New intermediate subdirectories are created as needed on the target and directories which become empty on the target are removed.
-nDry-run, just show which operations would be performed.
-zMatch files by size only. In this case, hash databases are not created or updated.
-M=<size>Excludes all files larger than <size>, which may be given in human form, e.g.
--exclude <glob_pattern> ... <glob_pattern>Exclude source files and directories by glob patterns. There is a corresponding
--includeand these are interpreted as in
rsync --exclude <pattern> source/ target(beware, compatability has not been fully tested).
A file or directory is excluded if it matches an
excludepattern before matching any
An initial slash anchors a pattern to the corresponding file tree root and a trailing slash means the pattern applies only to directories.
--includeoption applies to all file trees in the command.
--once-include=<pattern>apply only to the following file tree.
--only-include <PATTERN> ...is equivalent to
--include="*/" --include <PATTERNS> ... --exclude="*".
--root <DIR>For each online location that is a subdir of <DIR>, use the hash database at <DIR> to read and update. If several <DIR> in the command line are suitable for a location, use the last one given.
rsync [options] <tree> <dir> [rsync-options]Prints an rsync command that would sync target dir from source, skipping lnsync database files. Source may be a dir or an offline tree. Check the default rsync options provided are what you want. To also run the rsync command, use the
syncrThis convenience command is like
sync, but follows it by executing the command created by
Creating, Updating, and Accessing the Hash Database
update <dir>Update the hash database, creating a new database if none exists, and rehashing all new files and those with a changed modification time (mtime). Accepts
rehash <dir> [<relpath>]+Force rehashing specified files and subdirs.
subdir <dir> <relsubdir>Update the database at
relsubdirusing any hash value already present in the hash database for
mkoffline <dir> <outputfile>Update database at
dirand create corresponding offline database at
-fto force overwriting the output file.
cleandb <dir>Remove outdated entries and re-compact the database.
lookup <tree> [<relpath>+]Returns (either from db or by recomputing) the hash value for the files, where
treemay be a a directory or an offline tree.
Finding Files and Paths
These commands operate on files, as opposed to paths. To instead operate on paths, use the
--hard-links switch on these commands. To operate on files, but print all hard links, instead a of picking one, use
cmp <tree1> <tree2>Recursively compares two file trees. Accepts
fdupes [-h] [<tree>]+Find files duplicated anywhere on the given trees.
lnsync onfirstonly [<tree>]+,
lnsync onlastonly [<tree>]+Find files as advertised. Some options:
-Mprunes by maximum size;
-0prunes empty files;
-1prints each group of files in a single line, separated by spaces and with escaped backslashes and spaces, like
-ssorts output by size.
search <tree> [<globpat>]+Find files one of whose relative paths matches one of the given glob patterns (which are as in
check [<tree>] [<path>]*Recompute hashes for given files and compare to the hash stored in the database, to check for changes/bitrot.
Optional command-line arguments are read from an INI-style configuration file. (The format is not very suitable to store default options, at most one entry per key.)
Unless otherwise specified, the config file is searched at at
~/.lnsync.cfg location may nbe specified. By default, it
key = value, the
key can match the short or long option name (
For options taking taking multiple values (e.g.
exclude), separate them by line breaks.
Options in the
DEFAULT section apply as if given at the beginning of command line input.
For each location in the command line, options are read from all sections whose name glob-matches that location (directory or offline file). These options apply as if given just before that location.
For example, to have an option applied to all locations, include it in a section
To specify another configuration file altogether,
--config FILENAME. To not load any config file:
Origin, Status, and Future Development
This package started as a learning project. I've found it useful enough to polish for publication, but as with any work in progress, it should be used with adequate caution.
Feedback, suggestions, comments, and corrections are very welcome.
You can support this project with bitcoin at 17HS828pkQMiXZGy7UNbA49TYCz7LAQ2ze.
This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence v3 for details.
Caveats and Limitations
Local directories only.
Depends on mtime to detect file content changes.
If source files A, B, C (with pairwise distinct contents) are renamed on target in a cycle to C, A, B, sync is currently not supported.
Only readable files and readable+accessible directories are read. Other files and dirs, as well as symlinks, pipes, special devices are ignored.
Minimal support for case-insensitive but case-preserving file systems like vfat: if a target file name differs from source match in case only, target is not updated.
- v0.7.5: New lnsync64 and lnsync32 entry points. Non-zero return value to indicate failed searches. Small improvements and bug fixes.
- v0.7.4: New
- v0.7.3: Ignore all files matching
lnsync-*.db. Bug fixes, notably on handling 64-bit xxhash values.
- v0.7.2: Support for 64-bit hashing functions. Option
--dbrootdir. Internal changes: use xxhash over pyhashxx, refactoring, bug fixes.
- v0.7.0: Custom hashing functions, better command line argument parsing, custom db location, bug fixes.
- v0.6.1: Thread improvements and bug fixes.
- v0.6.0: Threaded hashing and tree scanning for much better performance. Internal refactoring.
- v0.5.3: New
mkofflinesyntax. More output options (
--all-links). Hard link-aware
- v0.5.2: Search files by file path glob pattern. Multiple patterns on --exclude. More powerful configuration files.
--rootnow allowed in
rehash. Major rewrite of the command line and config file parsers. Optimize onfirstonly and sync to do less hashing. Fix bugs in
check, and more. Wildcards in config section names.
- v0.4.0: Drop Python 2 compatibility. Add config files. Bug fixes.
- v0.3.8: Less hashing on
onfirstonly. Sort file search output by size. Adjust user output levels.
- v0.3.7: Bug fix on reading offline trees. Change output levels and some messages.
- v0.3.6: New: --include and --include-once options. Bug fix: wrong exit code.
- v0.3.5: Bug fix: not excluding dirs in offline mode.
- v0.3.3: Python 3 support.
- v0.3.2: New --root option to allow reading and updating a root tree database when querying subtrees.
- v0.3.0: Exclude files by glob pattern in sync and other commands. Better terminal output. Major code overhaul.
- v0.1.9: Improved sync algorithm. Remove directories left empty after sync.
- v0.1.0: Initial version.
- Support the newer xxhash3 hashes, or other, including 128-bit hashes.
- Better configuration file format.
- More parallel hashing, multiprocessing instead of threads.
- More output options, e.g. sort by name or mtime.
--excludepatterns more compatible with
- Store Unicode file names in offline database to support other operating systems. Currently stored as-is.
- Detect renamed directories for a compact sync schedule.
- Partial hashes for quicker comparison of same-size files.
- Check for duplicates by actual content, not just content hash.
- Update target mtimes.
- Support argparsecomplete.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size lnsync-0.7.5.post1.tar.gz (93.1 kB)||File type Source||Python version None||Upload date||Hashes View|