Skip to main content

GWAS summary statistics file validator

Project description

Summary Statistics TSV file Validator

A file validator for validating GWAS summary statistics TSV files prior to and post harmonisation using pandas_schema. The purpose is to validate files before their conversion to HDF5.

Requirements

  • python3

Installation

  • pip install ss-validate

Running the validator

To run the validator on a file:

  • ss-validate -f <file_to_validate.tsv> --logfile <logfile_name>

Information and errors are logged to the console and errors logged to the file specified. A console output might look like:

(INFO): Filename is good!
(INFO): Validating file...
(ERROR): Length of row 7 is: 16 instead of 15
(ERROR): Please fix the table. Some rows have different numbers of columns to the header
(INFO): Rows with different numbers of columns to the header are not validated
(ERROR): {row: 1, column: "p_value"}: "-99" was not in the range [0, 1)

The errors from the output tell us that row seven has too many columns and row one does not have a valid pvalue.

Addional options

  • --linelimit : int, default 1000

    Once this number of erroneous rows has been reached, stop looking for more.

  • --minrows : int, default 100000

    The minimum number of rows the file is required to have in order to validate sucZZcessfully.

  • --drop-bad-lines : bool, default False

    Drops the the lines with errors from the file and writes it to a new file called <file_to_validate.tsv.valid>

  • --stage : {'standard', 'harmonised', 'curated'}, default 'standard'

    The stage the file is in. It is either standard format ('standard'), harmonised ('harmonised') or pre-standard in the custom curated format ('curated'). Recommended to leave as default.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ss-validate, version 0.4.7
Filename, size File type Python version Upload date Hashes
Filename, size ss_validate-0.4.7-py3-none-any.whl (13.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size ss-validate-0.4.7.tar.gz (7.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page