OpenSpending Model/Data Validation
This command-line tool will help to check the validity of an OpenSpending dataset model before loading the data into the system. For this purpose, both model files and data files can be checked to see if they would pass the input validation of an OpenSpending import.
To validate a JSON model file, use the ‘model’ subcommand:
osvalidate model mymodel.json
Or, to check that a CSV sheet satisfies the requirements of an existing model, use the ‘data’ subcommand:
osvalidate data --model mymodel.json BigData.csv
Both commands will emit error messages whenever they find an inconsistency but otherwise try to continue the validation. Note, however, that a valid model file is required to run the data validator.
The model schema changes from time to time, so a migration option is provided so that existing files can be upgraded:
osvalidate migrate old.json >new.json
This will attempt to execute pending migrations and set an appropriate model version.
The JSON model format is described in further detail in the documentation.
You can generate a bare mapping from a JSON model file using:
osvalidate mapping mymodel.json
If you don’t have a JSON model file, you can generate one from a CSV file as follows:
osvalidate mapgen data.csv
You will need to edit the result of this to add information (like textual explanations of the fields) can’t be programmatically inferred from the contents of the CSV file.
Installation is as a conventional Python tool:
virtualenv pyenv . pyenv/bin/activate python setup.py install
Each new version of osvalidate needs to be published in a few steps:
How to write a migration - Migrations of model formats are simple functions, usually named mYYYY_MM_DD_purpose and stored in the migrations module. They must both accept and return a model file and be registered in openspending.validation.model.migration:MIGRATIONS with an increasing version stamp (i.e. the current date). The version stamp of the latest executed migration will automatically be saved to the model and used as a minimum for the next run.
In general, migrations should make as few assumptions about the input they receive as possible and execute idempotently. Migrations cannot change the dataset section of the model.