Skip to main content

GWAS summary statistics file validator

Project description

Summary Statistics TSV file Validator

A file validator for validating GWAS summary statistics TSV files prior to and post harmonisation using pandas_schema. The purpose is to validate files before their conversion to HDF5.

Installation

Python package:

  • Requires python3
  • pip install ss-validate

Alternatively, use the docker image:

  • docker run ebispot/gwas-sumstats-validator ss-validate --help

Running the validator

To run the validator on a file:

  • ss-validate -f <file_to_validate.tsv> --logfile <logfile_name>

Information and errors are logged to the console and errors logged to the file specified. A console output might look like:

(INFO): Filename is good!
(INFO): Validating file...
(ERROR): Length of row 7 is: 16 instead of 15
(ERROR): Please fix the table. Some rows have different numbers of columns to the header
(INFO): Rows with different numbers of columns to the header are not validated
(ERROR): {row: 1, column: "p_value"}: "-99" was not in the range [0, 1)

The errors from the output tell us that row seven has too many columns and row one does not have a valid pvalue.

Addional options

  • --linelimit : int, default 1000

    Once this number of erroneous rows has been reached, stop looking for more.

  • --minrows : int, default 100000

    The minimum number of rows the file is required to have in order to validate sucZZcessfully.

  • --drop-bad-lines : bool, default False

    Drops the the lines with errors from the file and writes it to a new file called <file_to_validate.tsv.valid>

  • --stage : {'standard', 'harmonised', 'curated'}, default 'standard'

    The stage the file is in. It is either standard format ('standard'), harmonised ('harmonised') or pre-standard in the custom curated format ('curated'). Recommended to leave as default.

Import ss-validate to another python script

  • Install as above
  • Import and use in your python file
import ss_validate.validator as ssv

# initialise a validator object for your summary statistics and settings 
validator = ssv.Validator(file='sumstats.tsv.gz', filetype='gwas-upload', error_limit=1, logfile='logfile.log')

# validate the headers
validator.validate_headers()

# validate the squareness
validator.validate_file_squareness()

# validate the data
validator.validate_data()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ss-validate-1.0.0.dev3.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

ss_validate-1.0.0.dev3-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file ss-validate-1.0.0.dev3.tar.gz.

File metadata

  • Download URL: ss-validate-1.0.0.dev3.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.28.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for ss-validate-1.0.0.dev3.tar.gz
Algorithm Hash digest
SHA256 387911efc571243589e25856a881d7a3f18f3dd09d7135c57a025d425493d1b7
MD5 ff4c897836032d43b4a0344b1ca009f4
BLAKE2b-256 d213db3a2b26d34ec48b95af4d4b3c5bab273d3e9cb080477f599a54d9f3dfed

See more details on using hashes here.

File details

Details for the file ss_validate-1.0.0.dev3-py3-none-any.whl.

File metadata

  • Download URL: ss_validate-1.0.0.dev3-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.28.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for ss_validate-1.0.0.dev3-py3-none-any.whl
Algorithm Hash digest
SHA256 f0df9d3ec9bf751736ebf585c0aac23d8a2a1dfb8487902c97674adac12dbd57
MD5 ad4d0a49c7d3f37498602e80c0acf64c
BLAKE2b-256 80e5596802ab65ff20d43ccc32c51e7890450d68f94c563e1b4e4e1263099796

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page