A collection of handy tools for GWAS
Project description
gwaslab
Note: Some part of the docs are outdated. I am currently updating the documents.
- A simple python package for handling GWAS sumstats.
- Each process is modularized and can be customized to your needs.
- Most manipulations are designed as methods of python object,
gwaslab.Sumstats
.
Please check GWASLab document at https://cloufield.github.io/gwaslab/
Install
pip install gwaslab==3.3.3
import gwaslab as gl
# load plink2 output
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", fmt="plink2")
# or you can specify the columns:
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
n="N",
build="19")
# manhattan and qq plot
mysumstats.plot_mqq()
...
Functions
Loading and Formatting
- Loading sumstats by simple specifying the software name
- Optional Filtering of Hapmap3 / High-LD region / HLA when output sumtats
- Converting GWAS sumstats to specific formats
- LDSC / MAGMA / METAL / MR-MEGA / FUMA / VCF / BED...
- check available formats
Standardization & Normalization
- Variant ID standardization
- CHR and POS notation standardization
- Variant POS and allele normalization
- Genome build : Infer and Liftover
Quality control, Value conversion & Filtering
- General statistics sanity check
- Extreme value removal
- Equivalent statistics conversion
- BETA/SE , OR/OR_95L/OR_95U
- P, Z, CHISQ, MLOG10
- Customized value filtering.
Harmonization
- rsID assignment based on CHR, POS, and REF/ALT
- CHR POS assignment based on rsID using a reference text file
- Palindromic SNPs and indels strand inference using a reference VCF
- Check allele frequency discrepancy using a reference VCF
- Reference allele alignment using a reference genome sequence FASTA file
Visualization
- Mqq plot : Manhattan plot , QQ plot or MQQ plot (with a bunch of customizable features including auto-annotate nearest gene names)
- Miami plot : Manhattan plot
- Brisbane plot: GWAS hits density plot
- Regional plot : GWAS regional plot
- Heatmap : ldsc-rg genetic correlation matrix
- Scatter Plot : variant effect size comparison with sumstats
- Scatter Plot : allele frequency comparison
- Forest Plot : forest plots for meta-analysis of SNPs
Visualization Examples
Other Utilities
- Read ldsc h2 or rg outputs directly as DataFrames (auto-parsing).
- Extract lead variants given a sliding window size.
- Extract novel loci given a list of known lead variants.
- Logging : keep a complete record of manipulations from raw data to munged data.
- Sumstats summary function: know your data better.
Requirements:
- Python >= 3.6
- pySAM
- pyensembl
- scikit-allel
- Biopython >= 1.79
- liftover >= 1.1.13
- pandas >= 1.2.4
- numpy >= 1.21.2
- matplotlib >=3.5
- seaborn >=0.11.1
- scipy >=1.6.2
- statsmodels > =0.13
- adjustText
Contacts
- Github: https://github.com/Cloufield/gwaslab
- Blog (in Chinese): https://gwaslab.com/
- Email: gwaslab@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gwaslab-3.3.4.tar.gz
(98.3 MB
view details)
Built Distribution
gwaslab-3.3.4-py3-none-any.whl
(98.3 MB
view details)
File details
Details for the file gwaslab-3.3.4.tar.gz
.
File metadata
- Download URL: gwaslab-3.3.4.tar.gz
- Upload date:
- Size: 98.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.7.0 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.62.3 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a672a48b2776a8f6968a513ea9f8e41570b668e9ed34ee0820dba0e1a4892dab |
|
MD5 | fb828cd296e6fedfee2b53ff9ce43fac |
|
BLAKE2b-256 | e58db6982c07d6f3f700070fb6202d579720f4ffa3117d17276042c397486822 |
File details
Details for the file gwaslab-3.3.4-py3-none-any.whl
.
File metadata
- Download URL: gwaslab-3.3.4-py3-none-any.whl
- Upload date:
- Size: 98.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.7.0 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.62.3 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46fdf1315571640bc4c01ad652d777891fa7389884c86cd5b2b9859042392c90 |
|
MD5 | d305cc5bd7a418a62fc7dc156efd286c |
|
BLAKE2b-256 | 96147fd3c5f82aed3e94c670566d3c17669c375c9f341e3b3fd04234292ba2f2 |