No project description provided
Project description
GWAS SumStats Tools
A basic toolkit for reading and formatting GWAS sumstats files from the GWAS Catalog. Built with:
There are three commands, read
, validate
and format
.
read
is for:
- Previewing a data file: no options
- Extracting the field headers:
-h
- Extracting all the metadata:
-M
- Extacting specific field, value pairs from the metada:
-m <field name>
validate
is for:
- Validating a summary statistic file using a dynamically generated schema
format
is for:
- Converting a minamally formatted sumstats data file to the standard format. This is not guaranteed to return a valid standard file, because manadatory data fields could be missing in the input. It simply does the following.
-s
- Renames
variant_id
->rsid
- Reorders the fields
- Converts
NA
missing values to#NA
- It is memory efficient and will take approx. 30s per 1 million records
- Renames
- Generate metadata for a data file:
-m
- Read metadata in from existing file:
--meta-in <file>
- Create metadata from the GWAS Catalog (internal use, requires authenticated API):
-g
- Edit/add the values to the metadata:
-e
with--<FIELD>=<VALUE>
- Read metadata in from existing file:
Installation
$ pip install gwas-sumstats-tools
Usage
$ gwas-ssf [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
format
: Format a sumstats file and...read
: Read a sumstats file
gwas-ssf read
Read (preview) a sumstats file
Usage:
$ gwas-ssf read [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file [required]
Options:
-h, --get-header
: Just return the headers of the file [default: False]--meta-in PATH
: Specify a metadata file to read in, defaulting to -meta.yaml-M, --get-all-metadata
: Return all metadata [default: False]-m, --get-metadata TEXT
: Get metadata for the specified fields e.g. `-m genomeAssembly -m isHarmonised--help
: Show this message and exit.
gwas-ssf validate
Validate a sumstats file
Usage:
$ gwas-ssf validate [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]
Options:
-e, --errors-out
: Output erros to a csv file, .err.csv.gz-z, --p-zero
: Force p-values of zero to be allowable. Takes precedence over inferred value (-i)-n, --p-neg-log
: Force p-values to be validated as -log10. Takes precedence over inferred value (-i)-m, --min-rows
: Minimum rows acceptable for the file [default: 100000]-i, --infer-from-metadata
: Infer validation options from the metadata file -meta.yaml. E.g. fields for analysis software and negative log10 p-values affect the data validation behaviour.--help
: Show this message and exit.
gwas-ssf format
Format a sumstats file and creating a new one. Add/edit metadata.
Usage:
$ gwas-ssf format [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]
Options:
-o, --ss-out PATH
: Output sumstats file-s, --minimal2standard
: Try to convert a valid, minimally formatted file to the standard format.This assumes the file at least hasp_value
combined with rsid invariant_id
field orchromosome
andbase_pair_location
. Validity of the new file is not guaranteed because mandatory data could be missing from the original file. [default: False]-m, --generate-metadata
: Create the metadata file [default: False]--meta-out PATH
: Specify the metadata output file--meta-in PATH
: Specify a metadata file to read in-e, --meta-edit
: Enable metadata edit mode. Then provide params to edit in the--<FIELD>=<VALUE>
format e.g.--GWASID=GCST123456
to edit/add that value [default: False]-g, --meta-gwas
: Populate metadata from GWAS Catalog [default: False]-c, --custom-header-map
: Provide a custom header mapping using the--<FROM>:<TO>
format e.g.--chr:chromosome
[default: False]--help
: Show this message and exit.
Development
This repository uses poetry for dependency and packaging management.
To run the tests:
-
git clone https://github.com/EBISPOT/gwas-sumstats-tools.git
-
cd gwas-sumstats-tools
-
poetry install
-
poetry run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gwas_sumstats_tools-1.0.0a2.tar.gz
.
File metadata
- Download URL: gwas_sumstats_tools-1.0.0a2.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.7.4 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c70eaaf402ac42f32c6a490a810e0846d0caae3b4f89de0db97dc6ebd4ef2e1 |
|
MD5 | 43f36e1076daf689977be5d86424dd61 |
|
BLAKE2b-256 | 5186b19e0ed00af6fe50e1bab524971bee4c4f8e6460a803f80edba2b7b93720 |
Provenance
File details
Details for the file gwas_sumstats_tools-1.0.0a2-py3-none-any.whl
.
File metadata
- Download URL: gwas_sumstats_tools-1.0.0a2-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.7.4 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd3e4ad8dabcf49efea84065a71c4c5681396ac69c1387dd8616cddaf059ee0e |
|
MD5 | fd52145f4577d153d40e3d51e3dcb71e |
|
BLAKE2b-256 | f1a6099937fd4c5062728da05eda749f040cf9e2333fe38d53d8cdbdf2eee291 |