Package for working with GWAS summary statistics

These details have not been verified by PyPI

Project links

Homepage

Project description

Patch notes

16-09-2020 (v0.4.4)

Bugfix: Chromosome X now properly converted to 23

Older patchnodes can be found in PATCHNOTES.md

Description

A python package for working with GWAS summary statistics data in Python.
This package is designed to make it easy to read summary statistics, perform QC, merge summary statistics and perform meta-analysis.
Meta-analysis can be performed with .meta() with inverse-variance weighted or samplesize-weighted methods.
GWAMA as described in Baselmans, et al. (2019) can be performed using the .gwama() function in merged summary statistics.
The plotting package uses matplotlib.pyplot for generating figures, so the functions are generally compatible with matplotlib.pyplot colors, and Figure and Axis objects.
Warning: merging with low_memory enabled is still highly experimental.

Reference

Using the pysumstats package for a publication, or something similar? That is awesome!
There is no publication attached to this package, and I am not going to force anyone to reference me or make me a co-author or whatever, I want this to remain easily accessible. But I would greatly appreciate it if you add a link to this github, or a reference to it in the acknowledgements or something like that.
If you have any questions, want to help add methods or want to let me know you are planning a publication with this, you can get in touch via the pypi website of this project.
If you use the .gwama() method, please refer to the original publication: Baselmans, et al. (2019).

Installation

This package was made for Python 3.7. Clone the package directly from this github, or install with

pip3 install --upgrade pysumstats

Usage

import pysumstats as sumstats

Reading files

s1 = sumstats.SumStats("sumstats1.csv.gz", phenotype='GWASsummary1')

Reading data without sample size column: you will manually have to specify gwas sample size

s2 = sumstats.SumStats("sumstats2.txt.gz", phenotype='GWASsummary2', gwas_n=350492)

Reading data with column names not automatically recognized:

s3 = sumstats.SumStats("sumstats3.csv", phenotype='GWASsummary3',
                              column_names={
                                    'rsid': 'weird_name_for_rsid',
                                    'chr': 'weird_name_for_chr',
                                    'bp': 'weird_name_for_bp',
                                    'ea': 'weird_name_for_ea',
                                    'oa': 'weird_name_for_oa',
                                    'maf': 'weird_name_for_maf',
                                    'b': 'weird_name_for_b',
                                    'se': 'weird_name_for_se',
                                    'p': 'weird_name_for_p',
                                    'hwe': 'weird_name_for_p_hwe',
                                    'info': 'weird_name_for_info',
                                    'n': 'weird_name_for_n',
                                    'eaf': 'weird_name_for_eaf',
                                    'oaf': 'weird_name_for_oaf'})

Performing qc

s1.qc(maf=.01)
s2.qc(maf=.01, hwe=1e-6, info=.9)
s3.qc()  # MAF .01 is the default

Merging sumstats, low_memory option is still experimental so be careful with that

merge1 = s1.merge(s2)

Meta analysis

n_weighted_meta = merge1.meta_analyze(name='meta1', method='samplesize')  # N-weighted meta analysis
ivw_meta = merge1.meta_analyze(name='meta1', method='ivw')  # Standard inverse-variance weighted meta analysis
gwama = merge1.gwama(name='meta1', method='ivw')  # GWAMA as described in Baselmans, et al. (2019)

Additionally supports adding SNP heritabilities as weights

exc_meta = exc.gwama(h2_snp={'ntr_exc': .01, 'ukb_ssoe': .02}, name='exc', method='ivw')

And your own covariance matrix (called cov_Z in most R scripts)

# Either read it from a file:
import pandas as pd
cov_z = pd.read_csv('my_cov_z.csv') # Note it should be pandas dataframe with column names and index names equal to your phenotypes

# Or generate it from a phenotype file yourself:
phenotypes = pd.read_csv('my_phenotype_file.csv')
cov_z = sumstats.cov_matrix_from_phenotype_file(phenotypes, phenotypes=['GWASsummary1', 'GWASsummary2'])

gwama = exc.gwama(cov_matrix=cov_z, h2_snp={'GWASsummary1': .01, 'GWASsummary2': .02}, name='meta1', method='ivw')

See a summary of the result

gwama.describe()

See head of the data

gwama.head()

See head of all chromosomes

gwama.head(n_chromosomes=23)

QQ and Manhattan plots of the result

gwama.manhattan(filename='meta_manhattan.png')
gwama.qqplot(filename='meta_qq.png')

Save the result as csv

exc.save('exc_sumstats.csv')

Save the result as a pickle file (way faster to save and load back into Python)

exc.save('exc_sumstats.pickle')

Merge gwama results with another file:

merged = gwama.merge(s3)

Save prepped files for MR analysis in R:

merged.prep_for_mr(exposure='GWASsummary3', outcome='meta1',
                   filename=['GWAS3-Meta.csv', 'Meta-GWAS3.csv'],
                   p_cutoff=5e-8, bidirectional=True, index=False)

The resulting files will have the following column names, per specification of the MendelianRandomization package in R:

rsid chr bp exposure.A1 exposure.A2 outcome.A1 outcome.A2 exposure.se exposure.b outcome.se outcome.b

Some other stuff:

# See column names of the file
gpc_neuro.columns

# SumStats support for standard indexing is growing:
exc[0]  # Get the full output of the first SNP
exc[:10]  # Get the full output of the first 10 SNPs
exc[:10, 'p']  # Get the p value of the first 10 SNPs
exc['p']  # Get the p values of all SNPs
exc['rs78948828']  # Get the full output of 1 specific rsid
exc[['rs78948828', 'rs6057089', 'rs55957973']]  # Get the full output of multiple specific rsids
exc[['rs78948828', 'rs6057089', 'rs55957973'], 'p']  # Get the p-value for specific rsids

# If for whatever reason you want to do stuff with each SNP individually you can also loop over the entire file
for snp_output in exc:
    if exc['p'] < 5e-8:
        print('Yay significant SNP!')
    # do something


# If you only want to loop over some specific columns, you can
for rsid, b, se, p in exc[['rsid', 'b', 'se', 'p']].values:
    if p < 5e-8:
        print('Yay significant SNP!')

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.2

Jan 14, 2021

0.5.1

Jan 14, 2021

0.5.0

Jan 4, 2021

0.4.6

Nov 30, 2020

0.4.5

Nov 30, 2020

This version

0.4.4

Sep 16, 2020

0.4.3

Sep 4, 2020

0.4.2

Jul 28, 2020

0.4.1

Jun 2, 2020

0.4

May 15, 2020

0.3.1

May 13, 2020

0.3

May 12, 2020

0.2.3

May 11, 2020

0.2.2.4

May 11, 2020

0.2.2.3

May 11, 2020

0.2.2.2

May 11, 2020

0.2.2.1 yanked

May 11, 2020

0.2.2

May 11, 2020

0.2

May 8, 2020

0.1

May 8, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysumstats-0.4.4.tar.gz (30.5 kB view details)

Uploaded Sep 16, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pysumstats-0.4.4-py3-none-any.whl (30.0 kB view details)

Uploaded Sep 16, 2020 Python 3

File details

Details for the file pysumstats-0.4.4.tar.gz.

File metadata

Download URL: pysumstats-0.4.4.tar.gz
Upload date: Sep 16, 2020
Size: 30.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for pysumstats-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`c896b679317f16a56e5be9481ce3fc9ca748f7b1a12aa3f6969c45bd6e941085`
MD5	`71c6f5c75da3863cc231339e73ff1644`
BLAKE2b-256	`7ae90b1015463aa2fe100bbdda8e28918807b1879f508c0f511874d871305d33`

See more details on using hashes here.

File details

Details for the file pysumstats-0.4.4-py3-none-any.whl.

File metadata

Download URL: pysumstats-0.4.4-py3-none-any.whl
Upload date: Sep 16, 2020
Size: 30.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for pysumstats-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e7f6b2d323ff675716fd55551d79531f01865cafbca4b179d111787ebb832226`
MD5	`68dfa75d9415b0fc569b285560b91485`
BLAKE2b-256	`427364f12da5ba72d387ee02562cf3503be6cd4f206d4b882e045f6674913610`

See more details on using hashes here.

pysumstats 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Patch notes

16-09-2020 (v0.4.4)

Previous

Description

Reference

Installation

Usage

Reading files

Reading data without sample size column: you will manually have to specify gwas sample size

Reading data with column names not automatically recognized:

Performing qc

Merging sumstats, low_memory option is still experimental so be careful with that

Meta analysis

Additionally supports adding SNP heritabilities as weights

And your own covariance matrix (called cov_Z in most R scripts)

See a summary of the result

See head of the data

See head of all chromosomes

QQ and Manhattan plots of the result

Save the result as csv

Save the result as a pickle file (way faster to save and load back into Python)

Merge gwama results with another file:

Save prepped files for MR analysis in R:

Some other stuff:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes