Skip to main content

GWAS summary statistics file validator

Project description

Summary Statistics TSV file Validator

A file validator for validating GWAS summary statistics TSV files prior to and post harmonisation using pandas_schema. The purpose is to validate files before their conversion to HDF5.

Requirements

  • python3

Installation

  • pip install ss-validate

Running the validator

To run the validator on a file:

  • ss-validate -f <file_to_validate.tsv> --logfile <logfile_name>

Information and errors are logged to the console and errors logged to the file specified. A console output might look like:

(INFO): Filename is good!
(INFO): Validating file...
(ERROR): Length of row 7 is: 16 instead of 15
(ERROR): Please fix the table. Some rows have different numbers of columns to the header
(INFO): Rows with different numbers of columns to the header are not validated
(ERROR): {row: 1, column: "p_value"}: "-99" was not in the range [0, 1)

The errors from the output tell us that row seven has too many columns and row one does not have a valid pvalue.

Addional options

  • --linelimit : int, default 1000

    Once this number of erroneous rows has been reached, stop looking for more.

  • --minrows : int, default 100000

    The minimum number of rows the file is required to have in order to validate sucZZcessfully.

  • --drop-bad-lines : bool, default False

    Drops the the lines with errors from the file and writes it to a new file called <file_to_validate.tsv.valid>

  • --stage : {'standard', 'harmonised', 'curated'}, default 'standard'

    The stage the file is in. It is either standard format ('standard'), harmonised ('harmonised') or pre-standard in the custom curated format ('curated'). Recommended to leave as default.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ss-validate-0.4.3.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ss_validate-0.4.3-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file ss-validate-0.4.3.tar.gz.

File metadata

  • Download URL: ss-validate-0.4.3.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.10

File hashes

Hashes for ss-validate-0.4.3.tar.gz
Algorithm Hash digest
SHA256 559ad3fb854d2a3cd80585ba5b25d394f96459d4307c93239cdc20eb59f2a4e9
MD5 907d377025515abe10c83e7b33cba086
BLAKE2b-256 e5f20c6dde22e6502be96212f51e97321f5c63b30280a3d7e5b2bd00e74416e3

See more details on using hashes here.

File details

Details for the file ss_validate-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: ss_validate-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.10

File hashes

Hashes for ss_validate-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 336f8403567dc3c50e0c182eba124e47c49125d81f9034fc49613ccd8a6968de
MD5 59b35db04a18a3fcce9304dd8db8816a
BLAKE2b-256 629b162f5038533881169e8e340024b57ddc2c2abae87300ca27ec8cc23493f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page