GWAS summary statistics file validator
Project description
Summary Statistics TSV file Validator
A file validator for validating GWAS summary statistics TSV files prior to and post harmonisation using pandas_schema. The purpose is to validate files before their conversion to HDF5.
Installation
Python package:
- Requires python3
pip install ss-validate
Alternatively, use the docker image:
docker run ebispot/gwas-sumstats-validator ss-validate --help
Running the validator
To run the validator on a file:
ss-validate -f <file_to_validate.tsv> --logfile <logfile_name>
Information and errors are logged to the console and errors logged to the file specified. A console output might look like:
(INFO): Filename is good!
(INFO): Validating file...
(ERROR): Length of row 7 is: 16 instead of 15
(ERROR): Please fix the table. Some rows have different numbers of columns to the header
(INFO): Rows with different numbers of columns to the header are not validated
(ERROR): {row: 1, column: "p_value"}: "-99" was not in the range [0, 1)
The errors from the output tell us that row seven has too many columns and row one does not have a valid pvalue.
Addional options
-
--linelimit
: int, default 1000Once this number of erroneous rows has been reached, stop looking for more.
-
--minrows
: int, default 100000The minimum number of rows the file is required to have in order to validate sucZZcessfully.
-
--drop-bad-lines
: bool, default FalseDrops the the lines with errors from the file and writes it to a new file called <file_to_validate.tsv.valid>
-
--stage
: {'standard', 'harmonised', 'curated'}, default 'standard'The stage the file is in. It is either standard format ('standard'), harmonised ('harmonised') or pre-standard in the custom curated format ('curated'). Recommended to leave as default.
Import ss-validate to another python script
- Install as above
- Import and use in your python file
import ss_validate.validator as ssv
# initialise a validator object for your summary statistics and settings
validator = ssv.Validator(file='sumstats.tsv.gz', filetype='gwas-upload', error_limit=1, logfile='logfile.log')
# validate the headers
validator.validate_headers()
# validate the squareness
validator.validate_file_squareness()
# validate the data
validator.validate_data()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ss_validate-1.0.0.dev0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd360fecca4e7dd386ca0a59faee26f8b57cdfa5847337dc4da470a310463f06 |
|
MD5 | 8fd9e571c9e853b41058089c406ca47a |
|
BLAKE2b-256 | e7e74fe3733a3d7ce4e3f29fe6dc0dccb334460cc5c7e3160834558ccef03d83 |