Skip to main content

Helps validate the integrity of data backups/exports.

Project description

spot_check_files

This is a tool to help validate the integrity of a set of files, e.g. data backups/exports.

  • Checks recognized file types for errors, e.g. invalid json.
  • Generates thumbnails of files when possible.
  • Displays statistics about file types and unrecognized files.

It produces a report like the following in the terminal (seeing images in the terminal requires iTerm2):

screenshot of sample output in iTerm2

Or as HTML:

screenshot of rendered sample HTML output

Usage

Install:

  1. Install python3 and pip
  2. pip3 install spot_check_files[imgcat]
    • imgcat is optional and enables support for displaying thumbnails in iTerm2 on OS X

Run:

spotcheck PATH

This will output basic stats and any errors the tool detects in the given files/directories. If you're using iTerm2 on Mac, it will also show thumbnails of files.

Alternatively, you can generate an HTML report:

spotcheck -H PATH > out.html

The full list of options can be seen here or by running spotcheck --help.

This tool can also be used programmatically. The main entry point for the library is the CheckerRunner class in spot_check_files.checker. You can add support for new file types by subclassing the Checker class from that module.

Supported file types

The command-line tool currently relies entirely on file extension to determine file types.

Type Support
Archive files:
  • .tar
  • .tar.bz2
  • .tar.gz
  • .tar.xz
  • .tbz
  • .tgz
  • .txz
  • .zip
Recursively checks all the files in the archive (including other archives)
CSV files:
  • .csv
  • .tsv
Checks that the CSV dialect can be detected and read by Python, and builds a thumbnail
Image files:
  • .bmp
  • .gif
  • .icns
  • .ico
  • .jpg
  • .jpeg
  • .png
  • .tiff
  • .webp
Checks that the file can be loaded by the Python imaging library Pillow, and builds a thumbnail
JSON files: .json Checks that the json can be parsed, and builds a thumbnail of the pretty-printed json
Text files:
  • .md
  • .txt
Treating the file as plaintext, builds a thumbnail
XML files: .xml Checks that the xml can be parsed, and builds a thumbnail of the pretty-printed xml
anything supported by OS X Quick Look (HTML, Office docs, ...) OS X ONLY: generates thumbnails using Quick Look. This greatly increases the number of supported file types. However, it's slow.

Development

Setup:

  1. Install python3 and pip
  2. Clone the repo
  3. I recommend creating a venv:
    cd spot_check_files
    python3 -m venv venv
    source venv/bin/activate
    
  4. Install dependencies:
    pip install .
    pip install -r requirements-dev.txt
    

To run tests:

PYTHONPATH=src pytest

(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)

To run the CLI:

PYTHONPATH=src python -m spot_check_files ...

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/spot_check_files.

License

This is available as open source under the terms of the MIT License.

This package includes and uses a copy of the Monoid font, which is also MIT-licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spot_check_files-0.0.2.tar.gz (49.6 kB view hashes)

Uploaded Source

Built Distribution

spot_check_files-0.0.2-py3-none-any.whl (51.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page