Skip to main content

Helps validate the integrity of data backups/exports.

Project description

spot_check_files

This is a tool to help validate the integrity of a set of files, e.g. data backups/exports.

  • Checks recognized file types for errors, e.g. invalid json.
  • Generates thumbnails of files when possible.
  • Displays statistics about file types and unrecognized files.

It produces a report like the following in the terminal (seeing images in the terminal requires iTerm2):

screenshot of sample output in iTerm2

Or as HTML:

screenshot of rendered sample HTML output

Usage

Install:

  1. Install python3 and pip
  2. pip3 install spot_check_files[imgcat]
    • imgcat is optional and enables support for displaying thumbnails in iTerm2 on OS X

Run:

spotcheck PATH

This will output basic stats and any errors the tool detects in the given files/directories. If you're using iTerm2 on Mac, it will also show thumbnails of files.

Alternatively, you can generate an HTML report:

spotcheck -H PATH > out.html

The full list of options can be seen here or by running spotcheck --help.

This tool can also be used programmatically. The main entry point for the library is the CheckerRunner class in spot_check_files.checker. You can add support for new file types by subclassing the Checker class from that module.

Supported file types

The command-line tool currently relies entirely on file extension to determine file types.

Type Support
Archive files:
  • .tar
  • .tar.bz2
  • .tar.gz
  • .tar.xz
  • .tbz
  • .tgz
  • .txz
  • .zip
Recursively checks all the files in the archive (including other archives)
CSV files:
  • .csv
  • .tsv
Checks that the CSV dialect can be detected and read by Python, and builds a thumbnail
Image files:
  • .bmp
  • .gif
  • .icns
  • .ico
  • .jpg
  • .jpeg
  • .png
  • .tiff
  • .webp
Checks that the file can be loaded by the Python imaging library Pillow, and builds a thumbnail
JSON files: .json Checks that the json can be parsed, and builds a thumbnail of the pretty-printed json
Text files:
  • .md
  • .txt
Treating the file as plaintext, builds a thumbnail
XML files: .xml Checks that the xml can be parsed, and builds a thumbnail of the pretty-printed xml
anything supported by OS X Quick Look (HTML, Office docs, ...) OS X ONLY: generates thumbnails using Quick Look. This greatly increases the number of supported file types. However, it's slow.

Development

Setup:

  1. Install python3 and pip
  2. Clone the repo
  3. I recommend creating a venv:
    cd spot_check_files
    python3 -m venv venv
    source venv/bin/activate
    
  4. Install dependencies:
    pip install .
    pip install -r requirements-dev.txt
    

To run tests:

PYTHONPATH=src pytest

(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)

To run the CLI:

PYTHONPATH=src python -m spot_check_files ...

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/spot_check_files.

License

This is available as open source under the terms of the MIT License.

This package includes and uses a copy of the Monoid font, which is also MIT-licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spot_check_files-0.0.2.tar.gz (49.6 kB view details)

Uploaded Source

Built Distribution

spot_check_files-0.0.2-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file spot_check_files-0.0.2.tar.gz.

File metadata

  • Download URL: spot_check_files-0.0.2.tar.gz
  • Upload date:
  • Size: 49.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3

File hashes

Hashes for spot_check_files-0.0.2.tar.gz
Algorithm Hash digest
SHA256 81114989988b4c2b9efe3fa30534ec26b765a2296ab7a9704f6dcf83516015d7
MD5 4c8e9f0509cde52e48b49069e4df369c
BLAKE2b-256 bad9676de36aa0e5b69a8f84d82d9d259ae1248c8e501a094d2d7bfb57d7b7fb

See more details on using hashes here.

File details

Details for the file spot_check_files-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: spot_check_files-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3

File hashes

Hashes for spot_check_files-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 809f7c8893da95c4e42447c7eaf6cddff96abe3050f33982cfc7d7857b7c8c5d
MD5 bd637d8306391d2529c603bbd301e924
BLAKE2b-256 da9b55fc84d4279329d50d5139871e176d4bda409484bc7294e230789b11964e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page