Skip to main content

A package for reading GWAS summary statistics stored in VCF/BCF format

Project description

GWAS-VCF Python parser

Build Status DOI

The package provides a thin wrapper around pysam and rsidx to parse VCF files containing GWAS summary statistics and trait metadata. See also gwasvcf an R package for parsing GWAS-VCF files.

Parses GWAS-VCF with version 1.0 of the specification

Install

pip install git+https://github.com/mrcieu/pygwasvcf

Data

Download over 10,000 GWAS-VCF files from https://gwas.mrcieu.ac.uk

Examples

Read GWAS trait/study metadata

import pygwasvcf
with pygwasvcf.GwasVcf("/path/to/gwas.vcf.gz") as g:
    # print dictionary of GWAS metadata
    print(g.get_metadata())

Query variant-trait association(s) by chromosome and position location

import pygwasvcf
with pygwasvcf.GwasVcf("/path/to/gwas.vcf.gz") as g:
    # query by chromosome and position interval
    for variant in g.query(contig="1", start=1, stop=1):
        print(variant)

Query variant-trait association(s) by dbSNP rsID

import pygwasvcf
with pygwasvcf.GwasVcf("/path/to/gwas.vcf.gz") as g:
    # index on dbSNP identifier
    # based on [rsidx](https://github.com/bioforensics/rsidx)
    # only need to do this once and then provide the index path to the constructor
    # i.e. GwasVcf("/path/to/gwas.vcf.gz", rsidx_path="/path/to/gwas.vcf.gz.rsidx")
    g.index_rsid()

    # rapid query by rsID  
    for variant in g.query(variant_id="rs1245"):
        print(variant)

Extract summary statistics from a variant object

import pygwasvcf
with pygwasvcf.GwasVcf("/path/to/gwas.vcf.gz") as g:
    # query by chromosome and position interval
    for variant in g.query(contig="1", start=1, stop=1):
        # print variant-trait P value
        print(pygwasvcf.VariantRecordGwasFuns.get_pval(variant, "trait_name"))
        # print variant-trait SE
        print(pygwasvcf.VariantRecordGwasFuns.get_se(variant, "trait_name"))
        # print variant-trait beta
        print(pygwasvcf.VariantRecordGwasFuns.get_beta(variant, "trait_name"))
        # print variant-trait allele frequency
        print(pygwasvcf.VariantRecordGwasFuns.get_af(variant, "trait_name"))
        # print variant-trait ID
        print(pygwasvcf.VariantRecordGwasFuns.get_id(variant, "trait_name"))
        # create and print ID on-the-fly if missing
        print(pygwasvcf.VariantRecordGwasFuns.get_id(variant, "trait_name", create_if_missing=True))
        # print variant-trait sample size
        print(pygwasvcf.VariantRecordGwasFuns.get_ss(variant, "trait_name"))
        # print variant-trait total sample size from header if per-variant is missing
        print(pygwasvcf.VariantRecordGwasFuns.get_ss(variant, "trait_name", g.get_metadata()))
        # print variant-trait number of cases
        print(pygwasvcf.VariantRecordGwasFuns.get_nc(variant, "trait_name"))
        # print variant-trait total cases from header if per-variant is missing
        print(pygwasvcf.VariantRecordGwasFuns.get_nc(variant, "trait_name", g.get_metadata()))

Documentation

API documentation available from https://mrcieu.github.io/pygwasvcf

Citation

The variant call format provides efficient and robust storage of GWAS summary statistics. Lyon MS, Andrews SJ, Elsworth B, Gaunt TR, Hemani G, Marcora E. Genome Biol. 22, 32 (2021).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygwasvcf-0.0.4.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygwasvcf-0.0.4-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file pygwasvcf-0.0.4.tar.gz.

File metadata

  • Download URL: pygwasvcf-0.0.4.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pygwasvcf-0.0.4.tar.gz
Algorithm Hash digest
SHA256 399d5847c2d63c97eaaaf9df06a07e77bca169e8b072fdca1f748a6a6bfc873e
MD5 3e2f660e4ce299290c6ddb4867276566
BLAKE2b-256 9285892c15374c7eab9dd5ed16ee57e840d1b2a34468baad424a624955372bdb

See more details on using hashes here.

File details

Details for the file pygwasvcf-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pygwasvcf-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pygwasvcf-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dcd2a390a738448718f4acf9b1f4628aca6e3d7c439f35074ead2a92be0673f8
MD5 94e8dd6cea1ffa1226d41a1db0b2d0ee
BLAKE2b-256 b249e50e0ee931ae0ff62c62fd66a81f75738283feae0e4ef0c63f4b06c8333b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page