Helps validate the integrity of data backups/exports.
Project description
spot_check_files
This is a tool to help validate the integrity of a set of files, e.g. data backups/exports.
- Checks recognized file types for errors, e.g. invalid json.
- Generates thumbnails of files when possible.
- Displays statistics about file types and unrecognized files.
It produces a report like the following in the terminal (seeing images in the terminal requires iTerm2):
Or as HTML:
Usage
Install:
- Install python3 and pip
pip3 install spot_check_files[imgcat]
- imgcat is optional and enables support for displaying thumbnails in iTerm2 on OS X
Run:
spotcheck PATH
This will output basic stats and any errors the tool detects in the given files/directories. If you're using iTerm2 on Mac, it will also show thumbnails of files.
Alternatively, you can generate an HTML report:
spotcheck -H PATH > out.html
The full list of options can be seen here or by running spotcheck --help
.
This tool can also be used programmatically.
The main entry point for the library is the CheckerRunner
class in spot_check_files.checker.
You can add support for new file types by subclassing the Checker
class from that module.
Supported file types
The command-line tool currently relies entirely on file extension to determine file types.
Type | Support |
---|---|
Archive files:
|
Recursively checks all the files in the archive (including other archives) |
CSV files:
|
Checks that the CSV dialect can be detected and read by Python, and builds a thumbnail |
Image files:
|
Checks that the file can be loaded by the Python imaging library Pillow, and builds a thumbnail |
JSON files: .json |
Checks that the json can be parsed, and builds a thumbnail of the pretty-printed json |
Text files:
|
Treating the file as plaintext, builds a thumbnail |
XML files: .xml |
Checks that the xml can be parsed, and builds a thumbnail of the pretty-printed xml |
anything supported by OS X Quick Look (HTML, Office docs, ...) | OS X ONLY: generates thumbnails using Quick Look. This greatly increases the number of supported file types. However, it's slow. |
Development
Setup:
- Install python3 and pip
- Clone the repo
- I recommend creating a venv:
cd spot_check_files python3 -m venv venv source venv/bin/activate
- Install dependencies:
pip install . pip install -r requirements-dev.txt
To run tests:
PYTHONPATH=src pytest
(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)
To run the CLI:
PYTHONPATH=src python -m spot_check_files ...
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/spot_check_files.
License
This is available as open source under the terms of the MIT License.
This package includes and uses a copy of the Monoid font, which is also MIT-licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spot_check_files-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 809f7c8893da95c4e42447c7eaf6cddff96abe3050f33982cfc7d7857b7c8c5d |
|
MD5 | bd637d8306391d2529c603bbd301e924 |
|
BLAKE2b-256 | da9b55fc84d4279329d50d5139871e176d4bda409484bc7294e230789b11964e |