A collection of handy tools for GWAS SumStats
Project description
gwaslab
- A handy python toolkit for handling GWAS sumstats.
- Each process is modularized and can be customized to your needs.
- Sumstats-specific manipulations are designed as methods of a python object,
gwaslab.Sumstats
.
Please check GWASLab document at https://cloufield.github.io/gwaslab/ Note: gwaslab is being updated very frequently for now. I will release the first stable version soon! Please stay tuned.
Install
pip install gwaslab==3.4.4
import gwaslab as gl
# load plink2 output
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", fmt="plink2")
# or you can specify the columns:
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
n="N",
build="19")
# manhattan and qq plot
mysumstats.plot_mqq()
...
Functions
Loading and Formatting
- Loading sumstats by simply specifying the software name or format name, or specifying each column name.
- Converting GWAS sumstats to specific formats:
- LDSC / MAGMA / METAL / PLINK / SAIGE / REGENIE / MR-MEGA / GWAS-SSF / FUMA / GWAS-VCF / BED...
- check available formats
- Optional filtering of variants in commonly used genomic regions: Hapmap3 SNPs / High-LD regions / MHC region
Standardization & Normalization
- Variant ID standardization
- CHR and POS notation standardization
- Variant POS and allele normalization
- Genome build : Inference and Liftover
Quality control, Value conversion & Filtering
- Statistics sanity check
- Extreme value removal
- Equivalent statistics conversion
- BETA/SE , OR/OR_95L/OR_95U
- P, Z, CHISQ, MLOG10P
- Customizable value filtering
Harmonization
- rsID assignment based on CHR, POS, and REF/ALT
- CHR POS assignment based on rsID using a reference text file
- Palindromic SNPs and indels strand inference using a reference VCF
- Check allele frequency discrepancy using a reference VCF
- Reference allele alignment using a reference genome sequence FASTA file
Visualization
- Mqq plot : Manhattan plot , QQ plot or MQQ plot (with a bunch of customizable features including auto-annotate nearest gene names)
- Miami plot : Manhattan plot
- Brisbane plot: GWAS hits density plot
- Regional plot : GWAS regional plot
- Heatmap : ldsc-rg genetic correlation matrix
- Scatter Plot : variant effect size comparison with sumstats
- Scatter Plot : allele frequency comparison
- Forest Plot : forest plots for meta-analysis of SNPs
Visualization Examples
Other Utilities
- Read ldsc h2 or rg outputs directly as DataFrames (auto-parsing).
- Extract lead variants given a sliding window size.
- Extract novel loci given a list of known lead variants / or known loci obtained form GWAS Catalog.
- Logging : keep a complete record of manipulations applied to the sumstats.
- Sumstats summary: give you a quick overview of the sumstats.
- ...
Requirements
Python >= 3.8
pySAM >0.18,<0.20
pyensembl >=2.2.3
scikit-allel
Biopython >= 1.79
liftover >= 1.1.13
pandas >= 1.3,<1.5
numpy >= 1.21.2
matplotlib >=3.5
seaborn >=0.11.1
scipy >=1.6.2
statsmodels > =0.13
adjustText
Citation
- GWASLab manuscript is in preparation and will be released soon.
- Sample GWAS data used in gwaslab is obtained from: http://jenger.riken.jp/ (Suzuki, Ken, et al. "Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population." Nature genetics 51.3 (2019): 379-386.).
Contacts
- Github: https://github.com/Cloufield/gwaslab
- Blog (in Chinese): https://gwaslab.com/
- Email: gwaslab@gmail.com
- Stats: https://pypistats.org/packages/gwaslab
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gwaslab-3.4.8.tar.gz
(20.6 MB
view details)
Built Distribution
gwaslab-3.4.8-py3-none-any.whl
(20.6 MB
view details)
File details
Details for the file gwaslab-3.4.8.tar.gz
.
File metadata
- Download URL: gwaslab-3.4.8.tar.gz
- Upload date:
- Size: 20.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 728061f163d2796ac0ee4cbd6bd0af133d6ad0bc7eda08d0c0b3d0c72133b5d8 |
|
MD5 | 096084e009079f109b8d665051e0a90f |
|
BLAKE2b-256 | 574a4ed921d4b6c0927c23727a987f62eeb1ab91df90b0b500b48e6f96b93fb1 |
File details
Details for the file gwaslab-3.4.8-py3-none-any.whl
.
File metadata
- Download URL: gwaslab-3.4.8-py3-none-any.whl
- Upload date:
- Size: 20.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 256332c76800e4eed24405e691dd79e26d4577111a56fe66537c7126e280b576 |
|
MD5 | d882da5efcb14e568b93879b1f097af0 |
|
BLAKE2b-256 | 8ce3618be0dc88cd8eb233a2a595294d60199f57fe04f532162aeca496fa1eff |