Skip to main content

A command and utility functions for making listings of file content hashcodes and manipulating directory trees based on such a hash index.

Project description

A command and utility functions for making listings of file content hashcodes and manipulating directory trees based on such a hash index.

Latest release 20240305:

  • HashIndexCommand.cmd_ls: support rhost:rpath paths, honour intterupts in the remote mode.
  • HashIndexCommand.cmd_rearrange: new optional dstdir command line argument, passed to rearrange.
  • merge: symlink_mode: leave identical symlinks alone, just merge tags.
  • rearrange: new optional dstdirpath parameter, default srcdirpath.

Function dir_filepaths(dirpath: str, *, fstags: cs.fstags.FSTags)

Generator yielding the filesystem paths of the files in dirpath.

Function dir_remap(srcdirpath: str, fspaths_by_hashcode: Mapping[cs.hashutils.BaseHashCode, List[str]], *, hashname: str)

Generator yielding (srcpath,[remapped_paths]) 2-tuples based on the hashcodes keying rfspaths_by_hashcode.

Function file_checksum(fspath: str, hashname: str = 'sha256', *, fstags: cs.fstags.FSTags) -> Optional[cs.hashutils.BaseHashCode]

Return the hashcode for the contents of the file at fspath. Warn and return None on OSError.

Function get_fstags_hashcode(fspath: str, hashname: str, fstags: cs.fstags.FSTags) -> Tuple[Optional[cs.hashutils.BaseHashCode], Optional[os.stat_result]]

Obtain the hashcode cached in the fstags if still valid. Return a 2-tuple of (hashcode,stat_result) where hashcode is a BaseHashCode subclass instance is valid or None if missing or no longer valid and stat_result is the current os.stat result for fspath.

Function hashindex(fspath, *, hashname: str, fstags: cs.fstags.FSTags)

Generator yielding (hashcode,filepath) 2-tuples for the files in fspath, which may be a file or directory path. Note that it yields (None,filepath) for files which cannot be accessed.

Class HashIndexCommand(cs.cmdutils.BaseCommand)

Tool to generate indices of file content hashcodes and to link files to destinations based on their hashcode.

Command line usage:

Usage: hashindex subcommand...
    Generate or process file content hash listings.
  Subcommands:
    help [-l] [subcommand-names...]
      Print help for subcommands.
      This outputs the full help for the named subcommands,
      or the short help for all subcommands if no names are specified.
      -l  Long help even if no subcommand-names provided.
    linkto [-f] [-h hashname] [--mv] [-n] [-q] [-s] srcdir dstdir < hashindex
      Link files from srcdir to dstdir according the input hash index.
      -f    Force: link even if the target already exists.
      -h hashname
            Specify the hash algorithm, default: sha256
      --mv  Move: unlink the original after a successful hard link.
      -n    No action; recite planned actions.
      -q    Quiet. Do not report actions.
      -s    Symlink the source file instead of hard linking.
    ls [-h hashname] [-r] [host:]path...
      Walk filesystem paths and emit a listing.
      -e ssh_exe    Specify the ssh executable.
      -h hashname   Specify the file content hash algorithm name.
      -H hashindex_exe
                    Specify the remote hashindex executable.
      -r            Emit relative paths in the listing.
                    This requires each path to be a directory.
    rearrange [options...] {[[user@]host:]refdir|-} [[user@]rhost:]targetdir [dstdir]
      Rearrange files in targetdir based on their positions in refdir.
      Options:
        -e ssh_exe  Specify the ssh executable.
        -h hashname Specify the file content hash algorithm name.
        -H hashindex_exe
                    Specify the remote hashindex executable.
        --mv        Move mode.
        -n          No action, dry run.
        -s          Symlink mode.
      Other arguments:
        refdir      The reference directory, which may be local or remote
                    or "-" indicating that a hash index will be read from
                    standard input.
        targetdir   The directory containing the files to be rearranged,
                    which may be local or remote.
        dstdir      Optional destination directory for the rearranged files.
                    Default is the targetdir.
                    It is taken to be on the same host as targetdir.
    shell
      Run a command prompt via cmd.Cmd using this command's subcommands.

Function localpath(fspath: str) -> str

Return a filesystem path modified so that it connot be misinterpreted as a remote path such as user@host:path.

If fspath contains no colon (:) or is an absolute path or starts with ./ then it is returned unchanged. Otherwise a leading ./ is prepended.

Function main(argv=None)

Commandline implementation.

Function merge(srcpath: str, dstpath: str, *, opname=None, hashname: str, move_mode: bool = False, symlink_mode=False, doit=False, quiet=False, fstags: cs.fstags.FSTags)

Merge srcpath to dstpath.

If dstpath does not exist, move/link/symlink srcpath to dstpath. Otherwise checksum their contents and raise FileExistsError if they differ.

Function paths_remap(srcpaths: Iterable[str], fspaths_by_hashcode: Mapping[cs.hashutils.BaseHashCode, List[str]], *, hashname: str)

Generator yielding (srcpath,fspaths) 2-tuples.

Function read_hashindex(f, start=1, *, hashname: str)

A generator which reads line from the file f and yields (hashcode,fspath) 2-tuples. If there are parse errors the hashcode or fspath may be None.

Function read_remote_hashindex(rhost: str, rdirpath: str, *, hashname: str, ssh_exe=None, hashindex_exe=None, check=True)

A generator which reads a hashindex of a remote directory, This runs: hashindex ls -h hashname -r rdirpath on the remote host. It yields (hashcode,fspath) 2-tuples.

Parameters:

  • rhost: the remote host, or user@host
  • rdirpath: the remote directory path
  • hashname: the file content hash algorithm name
  • ssh_exe: the ssh executable, default DEFAULT_SSH_EXE: 'ssh'
  • hashindex_exe: the remote hashindex executable, default DEFAULT_HASHINDEX_EXE: 'hashindex'
  • check: whether to check that the remote command has a 0 return code, default True

Function rearrange(srcdirpath: str, rfspaths_by_hashcode, dstdirpath=None, *, hashname: str, move_mode: bool = False, symlink_mode=False, doit: bool, quiet: bool = False, fstags: cs.fstags.FSTags, runstate: cs.resources.RunState)

Rearrange the files in dirpath according to the hashcode->[relpaths] fspaths_by_hashcode.

Parameters:

  • srcdirpath: the directory whose files are to be rearranged
  • rfspaths_by_hashcode: a mapping of hashcode to relative pathname to which the original file is to be moved
  • dstdirpath: optional target directory for the rearranged files; defaults to srcdirpath, rearranging the files in place
  • hashname: the file content hash algorithm name
  • move_move: move files instead of linking them
  • symlink_mode: symlink files instead of linking them
  • doit: if true do the link/move/symlink, otherwise just print
  • quiet: default False; if true do not print

Function run_remote_hashindex(rhost: str, argv, *, ssh_exe=None, hashindex_exe=None, check: bool = True, doit: bool = True, **subp_options)

Run a remote hashindex command. Return the CompletedProcess result or None if doit is false. Note that as with cs.psutils.run, the arguments are resolved via cs.psutils.prep_argv.

Parameters:

  • rhost: the remote host, or user@host
  • argv: the command line arguments to be passed to the remote hashindex command
  • ssh_exe: the ssh executable, default DEFAULT_SSH_EXE: 'ssh'
  • hashindex_exe: the remote hashindex executable, default DEFAULT_HASHINDEX_EXE: 'hashindex'
  • check: whether to check that the remote command has a 0 return code, default True
  • doit: whether to actually run the command, default True Other keyword parameters are passed therough to cs.psutils.run.

Function set_fstags_hashcode(fspath: str, hashcode, S: os.stat_result, fstags: cs.fstags.FSTags)

Record hashcode against fspath.

Release Log

Release 20240305:

  • HashIndexCommand.cmd_ls: support rhost:rpath paths, honour intterupts in the remote mode.
  • HashIndexCommand.cmd_rearrange: new optional dstdir command line argument, passed to rearrange.
  • merge: symlink_mode: leave identical symlinks alone, just merge tags.
  • rearrange: new optional dstdirpath parameter, default srcdirpath.

Release 20240216:

  • HashIndexCommand.cmdlinkto,cmd_rearrange: run the link/mv stuff with sys.stdout in line buffered mode.
  • DO not get hashcodes from symlinks.
  • HashIndexCommand.cmd_ls: ignore None hashcodes, do not set xit=1.
  • New run_remote_hashindex() and read_remote_hashindex() functions.
  • dir_filepaths: skip dot files, the fstags .fstags file and nonregular files.

Release 20240211.1: Better module docstring.

Release 20240211: Initial PyPI release: "hashindex" command and utility functions for listing file hashcodes and rearranging trees based on a hash index.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs.hashindex-20240305.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cs.hashindex-20240305-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file cs.hashindex-20240305.tar.gz.

File metadata

  • Download URL: cs.hashindex-20240305.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for cs.hashindex-20240305.tar.gz
Algorithm Hash digest
SHA256 6816d7e3a21c14f35080f76be3b7cfb272db66bcc71d90f5698e99962a3d815b
MD5 36d0198debe9e83b642a92ad9e007d4b
BLAKE2b-256 2f31b8f2674a06af7a34cde460bc0da2b127f80e1de78f68320ea39570ecba41

See more details on using hashes here.

File details

Details for the file cs.hashindex-20240305-py3-none-any.whl.

File metadata

File hashes

Hashes for cs.hashindex-20240305-py3-none-any.whl
Algorithm Hash digest
SHA256 023ef2e720dcd3d25fcc0fdf10f962a454d1947fdaa6e1673e98a924b24aea76
MD5 62ad2ea5b171203dc5dfcfe92923e93b
BLAKE2b-256 33a573e4d8db4cb536665693a3ed57e9f010eb48656d273b06cc0ae19fe8cb64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page