Skip to main content

No project description provided

Project description

GWAS SumStats Tools

A simple toolkit for reading and formatting GWAS sumstats files from the GWAS Catalog. Built with:

There are three commands, validate, read and format.

validate is for:

  • Validating a summary statistic file using a dynamically generated schema

read is for:

  • Previewing a data file: no options
  • Extracting the field headers: -h
  • Extracting all the metadata: -M
  • Extacting specific field, value pairs from the metada: -m <field name>
  • More functionality is to come...

format is for:

  • Converting a minamally formatted sumstats data file to the standard format. This is not guaranteed to return a valid standard file, because manadatory data fields could be missing in the input. It simply does the following. -s
    • Renames variant_id -> rsid
    • Reorders the fields
    • Converts NA missing values to #NA
    • It is memory efficient and will take approx. 30s per 1 million records
  • Generate metadata for a data file: -m
    • Read metadata in from existing file: --meta-in <file>
    • Create metadata from the GWAS Catalog (internal use, requires authenticated API): -g
    • Edit/add the values to the metadata: -e with --<FIELD>=<VALUE>

Requirements

  • python >= 3.9

Installation

Local installation with pip

$ pip3 install gwas-sumstats-tools

Run with Docker

The following Docker command is the equivalent to running gwas-ssf.

$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest

Just append any subcommands or arguments e.g.:

$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest validate

Usage

$ gwas-ssf [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • validate: Validate a sumstats file
  • format: Format a sumstats file
  • gen_meta: generate meta-yaml file
  • read: Read a sumstats file

gwas-ssf validate

Validate a sumstats file

Usage:

$ gwas-ssf validate [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file. Must be TSV (may be gzipped) [required]

Options:

  • -e, --errors-out: Output erros to a csv file, .err.csv.gz
  • -z, --p-zero: Force p-values of zero to be allowable. Takes precedence over inferred value (-i)
  • -m, --min-rows: Minimum rows acceptable for the file [default: 100000]
  • -i, --infer-from-metadata: Infer validation options from the metadata file -meta.yaml. E.g. a populated field for analysis software makes p-values of zero allowable.
  • --help: Show this message and exit.

gwas-ssf read

Read (preview) a sumstats file

Usage:

$ gwas-ssf read [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file [required]

Options:

  • -h, --get-header: Just return the headers of the file [default: False]
  • --meta-in PATH: Specify a metadata file to read in, defaulting to -meta.yaml
  • -M, --get-all-metadata: Return all metadata [default: False]
  • -m, --get-metadata TEXT: Get metadata for the specified fields e.g. `-m genomeAssembly -m isHarmonised
  • --help: Show this message and exit.

gwas-ssf format

Format a sumstats file and creating a new one. Add/edit metadata.

Usage:

$ gwas-ssf format [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file. Must be TSV or CSV and may be gzipped [required]

Options:

  • -o, --ss-out PATH: Output sumstats file
  • -s, --minimal2standard: Try to convert a valid, minimally formatted file to the standard format.This assumes the file at least has p_value combined with rsid in variant_id field or chromosome and base_pair_location. Validity of the new file is not guaranteed because mandatory data could be missing from the original file. [default: False]
  • -m, --generate-metadata: Create the metadata file [default: False]
  • --meta-out PATH: Specify the metadata output file
  • --meta-in PATH: Specify a metadata file to read in
  • -e, --meta-edit: Enable metadata edit mode. Then provide params to edit in the --<FIELD>=<VALUE> format e.g. --GWASID=GCST123456 to edit/add that value [default: False]
  • -g, --meta-gwas: Populate metadata from GWAS Catalog [default: False]
  • -c, --custom-header-map: Provide a custom header mapping using the --<FROM>:<TO> format e.g. --chr:chromosome [default: False]
  • --help: Show this message and exit.

gwas-ssf gen_meta

Generate a meta-yaml file for the existing sumstats file OR edit the existing meta-yaml file.

Usage:

$ gwas-ssf gen_meta [OPTIONS] FILENAME

Example:

# Generate a meta-yaml file from GWAS API (-g) with customised fields (-e --file_type=pre-gwas-ssf) for GCST90278188.tsv files
$ gwas-ssf gen_meta --meta-out GCST90278188.tsv-meta.yaml -g GCST90278188.tsv -e --file_type=pre-gwas-ssf

Arguments:

  • FILENAME: Input sumstats file. Must be TSV or CSV and may be gzipped [required]

Options:

  • --meta-out PATH: Specify the metadata output file
  • -g, --meta-gwas: Populate metadata from GWAS Catalog [default: False]
  • -e, --meta-edit: Enable metadata edit mode. Then provide params to edit in the --<FIELD>=<VALUE> format e.g. --GWASID=GCST123456 to edit/add that value [default: False]
  • --help: Show this message and exit.

Development

This repository uses poetry for dependency and packaging management.

To run the tests:

  1. install poetry

  2. git clone https://github.com/EBISPOT/gwas-sumstats-tools.git

  3. cd gwas-sumstats-tools

  4. poetry install

  5. poetry run pytest

To make a change: branch from master -> PR to master -> poetry version -> git add pyproject.toml -> git commit -> git tag -> git push origin master --tags If all the tests pass, this will publish to pypi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gwas_sumstats_tools-1.0.15.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

gwas_sumstats_tools-1.0.15-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file gwas_sumstats_tools-1.0.15.tar.gz.

File metadata

  • Download URL: gwas_sumstats_tools-1.0.15.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/3.10.0-1160.53.1.el7.x86_64

File hashes

Hashes for gwas_sumstats_tools-1.0.15.tar.gz
Algorithm Hash digest
SHA256 ae9b15ae143461d72edc05a02f9743ca1576d0c558ba327ff42e725c5ab53f96
MD5 fff1b2782d75f3884ed8882949f2abde
BLAKE2b-256 cd4f3a81a5c9a748299a64abec7f43b8b79efba2f1bab607a95b5e32ab5d0fd9

See more details on using hashes here.

Provenance

File details

Details for the file gwas_sumstats_tools-1.0.15-py3-none-any.whl.

File metadata

  • Download URL: gwas_sumstats_tools-1.0.15-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/3.10.0-1160.53.1.el7.x86_64

File hashes

Hashes for gwas_sumstats_tools-1.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 817c04bff883e47d5750e023e69a7f8a18470915c1f2b4f40a0bf8f5ac315f73
MD5 d4febd33a12ad7129a49c78de80ea69b
BLAKE2b-256 4048859800f6d2a1869e4f66ab1f47041897d510e5e312a889d7e046d324a4f1

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page