Skip to main content

Directory traversal utility with advanced filtering capabilities

Project description

Directory scanner

dirscan is a recursive directory traverser for efficiently scanning and comparing directories. It can store scan results in scan files which can be used later as input for comparing or verifying contents.

The module is written in Python 3. The module provides dirscan.walkdirs() which is similar to os.walk(), with the added ability to walk multiple directories in lock step simulaneously. The tool provides both a command-line tool, as well as a Python API for use in own projects.

Project home:

      https://github.com/sveinse/dirscan

Copyright 2010-2025, Svein Seldal

Installation

Download dirscan by installing from https://pypi.org/project/dirscan

$ pip install dirscan

Alternatively it can be built from source

$ git clone https://github.com/sveinse/dirscan.git
$ cd dirscan
$ pip install .

Command-line usage

The basic usage for the command line utility is

    dirscan [options] LEFT_DIR [RIGHT_DIR]

dirscan can operate in scan mode or compare mode, which is selected by the number of directories given. dirscan provides a rich set of options and dirscan --help provides detailed information about them.

Scan mode

scan mode is used when only LEFT_DIR is specified. It will traverse the directory and print the found files. dirscan is capable of printing the sha256 hashsum for each of the scanned files.

dirscan can store the scan results in a scan file with the --output, -o option. The scan file can be later read into dirscan as input and be printed or compared with other directories or scan files. When generating the scan file the sha256 hashsum will be generated for each of the file. This can be used later to compare or verify any changes to the contents of the files in a secure manner.

Compare mode

compare mode is selected when both the LEFT_DIR and RIGHT_DIR arguments are specified. It will compare the differences between left and right and print their difference, including comparing the contents of the files. Either of left and right arguments can be read from scan files.

dirscan supports customization by filtering and printing. The options --compare, --file-types, --ignore can be used to filter out what type of file or comparison result to show. Options --format and --summary provides extensive custom format printing capabilities.

API

dirscan can be used from Python by importing the module

    import dirscan

dirscan.walkdirs()

    dirscan.walkdirs(dirs, reverse=False, excludes=None, onefs=False,
                     traverse_into_oneside=False, exception_fn=None,
                     close_during=True)

Generator function that recursively traverses the directories in iterator dirs. It will scan a single or multile directories in tandem and return the found files and directories in lock step.

    # Scan directory and print path and file type for each entry in directory
    for path, (obj,) in dirscan.walkdirs(('path/to/dir',)):

        print(f"{path} is a {obj.objname}")

As it traverses the directory, it will yield tuples containing (path, objs) for each file object it finds. path represents the common relative file path. objs is a tuple of file objects from each of the dirs input (trees). Its length is equal to the length of dirs. Each object in objs are instances of dirscan.DirscanObj, depending on the type of the found object. E.g. dirscan.FileObj and dirscan.DirObj.

When more than one directory is specified in dirs, it will scan both directories in tandem and yeild the tuples from each of the trees in lock-step. If a file is not present in one dir, it will yield a dirscan.NonExistingObj instance instead.

    # Compare directories
    for path, (left, right) in dirscan.walkdirs((left_path, right_path)):

      if isinstance(left, dirscan.NonExistingObj):
        print(f"{path}: Only in right")
      if isinstance(right, dirscan.NonExistingObj):
        print(f"{path}: Only in left")

Function arguments:

  • dirs - List of directories to walk. The elements must either be a path string to a physical directory or a previously loaded DirObj object tree.

  • reverse - Reverses the scanning order

  • excludes - List of excluded paths, relative to the common path

  • onefs - Scan file objects belonging to the same file system only

  • traverse_onside - Will walk/yield all file objects in a directory that exists on only one side. I.e. comparing two directories, and one directory only exists on side, the scanner will not enter the onesided directory unless this option is set. See the command-line option --recurse.

  • exception_fn - Exception handler callback. It has format exception_fn(exception, path) returning Bool. It will be called if any scanning exceptions occur during traversal. If exception_fn() returns False or is not set, an ordinary exception will be raised.

  • close_during - will call obj.close() on objects that have been yielded to the caller. This allows freeing up parsed objects to conserve memory. Note that this tears down the in-memory directory tree, making it impossible to reuse the object tree after walkdirs() is complete.

dirscan.is_scanfile()

    dirscan.is_scanfile(filename)

Function to check if the input filename is readable and contains a valid scanfile header.

dirscan.read_scanfile()

    dirscan.read_scanfile(filename, root=None)

Read the given scanfile filename and return the root dirscan.DirObj() object. The root argument specifies the sub-dir within the scanfile which shall be returned as the root object. It has to point to a directory object.

dirscan.create_from_fs()

    dirscan.create_from_fs(name, path='', stat=None)

Create a new file system object instance (inherited from dirscan.DirscanObj) by querying the filesystem by the path given by path/name. If stat is None, the function will fetch the information itself from the filesystem.

If querying a directory, this function will not automatically traverse the filesystem. This will happen when obj.children() is called.

dirscan.create_from_dict()

    dirscan.create_from_dict(data)

Create a new file system object instance (inherited from dirscan.DirscanObj) from the dict data. This function is the oposite of obj.to_dict()

class dirscan.DirscanObj()

This is the base class for all file objects used by dirscan. This class is inherited into specific file-object types:

  • FileObj() - Files
  • LinkObj() - (Symbolic) links
  • DirObj() - Directories
  • BlockDevObj() - Block devices files
  • CharDevObj() - Character device files
  • SocketObj() - Socket files
  • FifoObj() - Pipe files
  • NonExistingObj() - Missing file object. See walkdirs().

This base class should not be used directly.

Attributes

  • obj.name Bare filename of the file path to the object

  • obj.path Directory part of the file path to the object

  • obj.fullpath Full filename path of the object. Same as pathlib.Path(obj.path, obj.name)

  • obj.objtype (deprecated) One character object type

    • f - File
    • l - Link
    • d - Directory
    • b - Block device
    • c - Character device
    • p - Fifo
    • s - Socket
    • - - Non existing file
  • obj.mode File mode, stat.st_mode, without filetype fields

  • obj.uid File user ID, stat.st_uid

  • obj.gid File group ID, stat.st_gid

  • obj.size File size, stat.st_size

  • obj.dev File system device, stat.st_dev

  • obj.mtime File modify timestamp, datetime.fromtimestamp(stat.st_mtime)

  • obj.hashsum and obj.hashsum_hex (FileObj only) Link destination, os.readlink()

  • obj.link (LinkObj only) Link destination, os.readlink()

  • obj.excluded If set the file has been excluded by the exclude filters. This attribute will not be serialized with obj.to_dict().

Methods

  • obj.__init__(self, name, path='', stat=None) Create a new file object.

  • obj.compare(self, other) A generator that compares this object with other. It yields a list of differences between self and other. An empty list is returned if the objects are equal. For files, compare will try to compare the contents of the files using filecmp.cmp() or by comparing the sha256 sum.

  • obj.children(self) Return an iterator of this object's children names.

  • obj.set_children(self, children) (DirObj only) Set the children for a directory. Useful when building the tree.

  • obj.close(self) Close this object. It breaks any links to any other objects to help up garbage collection.

  • obj.to_dict(self) Serialize and return a dict representation of the object. In DirObj it will recurse and return the full file tree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirscan-1.0.tar.gz (52.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dirscan-1.0-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file dirscan-1.0.tar.gz.

File metadata

  • Download URL: dirscan-1.0.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirscan-1.0.tar.gz
Algorithm Hash digest
SHA256 3da02ca87ff2cc2ca3d7878d123a38cfaf3256b76aacfdcb8ff61d46f1be1690
MD5 1b417e27db9bf83e1647abbb93c8868b
BLAKE2b-256 09c5b25b6c97cb54fffbbf3e65e7217d8b29756d5e390c5ed2f94f061c4821d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirscan-1.0.tar.gz:

Publisher: publish.yml on sveinse/dirscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirscan-1.0-py3-none-any.whl.

File metadata

  • Download URL: dirscan-1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirscan-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92ea98891856e1ad13126a9dd71c266926fbd896bee3e0e719ca92ef530cee5d
MD5 5992e7b9382864c747d99c16ee2de348
BLAKE2b-256 bdef42c320d489cfb129014d5efa47414aefed59c346e5fb869426c7b2db1285

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirscan-1.0-py3-none-any.whl:

Publisher: publish.yml on sveinse/dirscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page