Skip to main content

No project description provided

Project description

GWAS SumStats Tools

[!TIP] We are also developing schemas and validation tools for CNV and gene-level data in a separate repository, gwas-pysumstats. We welcome suggestions and feedback. Please note that this package is still under development.

You can access comprehensive documentation for using gwas-sumstat-tools at this link: GWAS SumStats Tools Documentation.

Overview:

There are four commands, read, format validate and gen_meta (gen_meta function is currently only accessible to internal GWAS catalog users.)

read is for:

  • Previewing a data file: no options
  • Extracting the field headers: -h
  • Extracting all the metadata: -M
  • Extacting specific field, value pairs from the metada: -m <field name>

format is for:

  • Converting sumstats data file to the standard format - gwas-ssf. This is not guaranteed to return a valid standard file, because manadatory data fields could be missing in the input.
    • Generate a configuration file, which serves as a blueprint for the formatting options.
    • Test the configuration file on the first five rows of the input file.
    • Apply the configuration file to the entire input file and generate formatted output file

    [!NOTE] It is memory efficient and will take approx. 30s per 1 million records

gen_meta is for:

  • Generate metadata for a data file: -m
    • Read metadata in from existing file: --meta-in <file>
    • Create metadata from the GWAS Catalog (internal use, requires authenticated API): -g
    • Edit/add the values to the metadata: -e with --<FIELD>=<VALUE>

validate is for:

  • Validating a summary statistic file using a dynamically generated schema

Requirements

  • python >= 3.9 and <3.12

Installation

Local installation with pip

$ pip3 install gwas-sumstats-tools

Run with Docker

The following Docker command is the equivalent to running gwas-ssf.

$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest

Just append any subcommands or arguments e.g.:

$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest validate

Usage

$ gwas-ssf [OPTIONS] COMMAND [ARGS]...

Options:

  • --help: Show this message and exit.

Commands:

  • validate: Validate a sumstats file
  • format: Format a sumstats file
  • gen_meta: generate meta-yaml file
  • read: Read a sumstats file

gwas-ssf validate

Validate a sumstats file

Usage:

$ gwas-ssf validate [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file. Must be TSV (may be gzipped) [required]

Options:

  • -e, --errors-out: Output erros to a csv file, .err.csv.gz
  • -z, --p-zero: Force p-values of zero to be allowable. Takes precedence over inferred value (-i)
  • -m, --min-rows: Minimum rows acceptable for the file [default: 100000]
  • -i, --infer-from-metadata: Infer validation options from the metadata file -meta.yaml. E.g. a populated field for analysis software makes p-values of zero allowable.
  • --help: Show this message and exit.

gwas-ssf read

Read (preview) a sumstats file

Usage:

$ gwas-ssf read [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file [required]

Options:

  • -h, --get-header: Just return the headers of the file [default: False]
  • --meta-in PATH: Specify a metadata file to read in, defaulting to -meta.yaml
  • -M, --get-all-metadata: Return all metadata [default: False]
  • -m, --get-metadata TEXT: Get metadata for the specified fields e.g. `-m genomeAssembly -m isHarmonised
  • --help: Show this message and exit.

gwas-ssf format

Format a sumstats file and creating a new one. Add/edit metadata.

Usage:

$ gwas-ssf format [OPTIONS] FILENAME

Arguments:

  • FILENAME: Input sumstats file. Must be TSV or CSV and may be gzipped [required]

Options:

  • Options for reading the input file
    • -d, --delimiter Text: Specify the delimiter in the file, if not specified, we can automatically detect the delimiter as whitespace if your file is *.txt, comma if your file is *.csv, or tab if your file is *.tsv.gz. Otherwise, please specify the delimiter which can help to recognise the column correctly
    • -r, --remove_comments Text: Remove the lines starts with the given character
  • Options for generating configuration file
    • -g, --generate_config Boolean: To generate the configuration file for the file needed to be formatted
    • --config_out Path:Specify the configure JSON output file
  • Options for applying configuration file
    • -o, --ss-out PATH: Output sumstats file
    • -a, --apply_config Boolean: Apply the given configuration file to the file
    • -t, -test_config Boolean: Test the given configuration file to the first 5 rows of the file
    • --config_in Path: Specify a configure JSON file to read in
    • -f, --analysis_software Text: Specify the analysis software used for generating the summary statistics data
    • -s, --minimal2standard: Try to convert a valid, minimally formatted file to the standard format.This assumes the file at least has p_value combined with rsid in variant_id field or chromosome and base_pair_location. Validity of the new file is not guaranteed because mandatory data could be missing from the original file. [default: False]
  • Options for batch applying configuration file
    • -b, --batch_apply Boolean: Apply configuration files to a batch of summary statistics files
    • --lsf Boolean:Running the batch process via submitting jobs via LSF
    • --slurm Boolean:Running the batch process via submitting job via Slurm

gwas-ssf gen_meta

Generate a meta-yaml file for the existing sumstats file OR edit the existing meta-yaml file.

Usage:

$ gwas-ssf gen_meta [OPTIONS] FILENAME

Example:

# Generate a meta-yaml file from GWAS API (-g) with customised fields (-e --file_type=pre-gwas-ssf) for GCST90278188.tsv files
$ gwas-ssf gen_meta --meta-out GCST90278188.tsv-meta.yaml -g GCST90278188.tsv -e --file_type=pre-gwas-ssf

Arguments:

  • FILENAME: Input sumstats file. Must be TSV or CSV and may be gzipped [required]

Options:

  • --meta-out PATH: Specify the metadata output file
  • -g, --meta-gwas: Populate metadata from GWAS Catalog [default: False]
  • -e, --meta-edit: Enable metadata edit mode. Then provide params to edit in the --<FIELD>=<VALUE> format e.g. --GWASID=GCST123456 to edit/add that value [default: False]
  • --help: Show this message and exit.

Development

This repository uses poetry for dependency and packaging management.

To run the tests:

  1. install poetry
  2. git clone https://github.com/EBISPOT/gwas-sumstats-tools.git
  3. cd gwas-sumstats-tools
  4. python3 -m venv env
  5. pip install poetry
  6. poetry install
  7. poetry run pytest -s

To make a change: branch from master -> PR to master -> poetry version -> git add pyproject.toml -> git commit -> git tag -> git push origin master --tags If all the tests pass, this will publish to pypi.

A simple toolkit for reading and formatting GWAS sumstats files from the GWAS Catalog. Built with:

Citation:

If you use the NHGRI-EBI GWAS Catalog tool in your research, please refer to the "How to Cite the NHGRI-EBI GWAS Catalog, Data, or Diagrams" section on our website for proper citation guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gwas_sumstats_tools-1.0.25.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gwas_sumstats_tools-1.0.25-py3-none-any.whl (44.1 kB view details)

Uploaded Python 3

File details

Details for the file gwas_sumstats_tools-1.0.25.tar.gz.

File metadata

  • Download URL: gwas_sumstats_tools-1.0.25.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gwas_sumstats_tools-1.0.25.tar.gz
Algorithm Hash digest
SHA256 f08d83e7109897e78719ae4862e94094e52609605f51fd9548b510e2f066dac6
MD5 ac14525ccf1b7e86ee8824b8feddff29
BLAKE2b-256 75a33759a1e82a4ce2b4ecc498c1ddfd0ebd4f091a6a2c00bec69e4f7df8e18d

See more details on using hashes here.

Provenance

The following attestation bundles were made for gwas_sumstats_tools-1.0.25.tar.gz:

Publisher: publish-pypi.yml on EBISPOT/gwas-sumstats-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gwas_sumstats_tools-1.0.25-py3-none-any.whl.

File metadata

File hashes

Hashes for gwas_sumstats_tools-1.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 989d42bd3d0dc03ae6e0e08df9fdd962d15936acab947e55903703b5871bad38
MD5 f39ceaa220935a51e3e62602ae9b070b
BLAKE2b-256 6ef9636fa872b3b3117e7ecd4cd2f425a2bd3cf641fd3d1e750b64e2bb25d707

See more details on using hashes here.

Provenance

The following attestation bundles were made for gwas_sumstats_tools-1.0.25-py3-none-any.whl:

Publisher: publish-pypi.yml on EBISPOT/gwas-sumstats-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page