Skip to main content

Utilities for working with PGS Catalog API and scoring files

Project description

PGS Catalog utilities

CI DOI

This repository is a collection of useful tools for downloading and working with scoring files from the PGS Catalog. This is mostly used internally by the PGS Catalog Calculator (PGScatalog/pgsc_calc); however, other users may find some of these tools helpful.

Overview

  • download_scorefiles: Download scoring files by PGS ID (accession) in genome builds GRCh37 or GRCh38
  • combine_scorefile: Combine multiple scoring files into a single scoring file in 'long' format
  • match_variants: Match target variants (bim or pvar files) against the output of combine_scorefile to produce scoring files for plink 2
  • ancestry_analysis : use genetic PCA loadings to compare samples to population reference panels, and report PGS adjusted for these axes of genetic ancestry. The PCs will likely have been generated with FRAPOSA (pgs catalog version)
  • validate_scorefiles: Check/validate that the scoring files and harmonized scoring files match the PGS Catalog scoring file formats.

Installation

$ pip install pgscatalog-utils

Quickstart

$ download_scorefiles -i PGS000922 PGS001229 -o . -b GRCh37
$ combine_scorefiles -s PGS*.txt.gz -o combined.txt 
$ match_variants -s combined.txt -t <example.pvar> --min_overlap 0.75 --outdir .
$ validate_scorefiles -t formatted --dir <scoringfiles_directory> --log_dir <logs_directory>

More details are available using the --help parameter.

Install from source

Requirements:

$ git clone https://github.com/PGScatalog/pgscatalog_utils.git
$ cd pgscatalog_utils
$ poetry install
$ poetry build
$ pip install --user dist/*.whl 

Credits

The pgscatalog_utils package is developed as part of the Polygenic Score (PGS) Catalog (www.PGSCatalog.org) project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert, Laurent Gil) and the European Bioinformatics Institute (Helen Parkinson, Aoife McMahon, Ben Wingfield, Laura Harris).

A manuscript describing the tool and larger PGS Catalog Calculator pipeline (PGSCatalog/pgsc_calc) is in preparation. In the meantime if you use these tools we ask you to cite the repo(s) and the paper describing the PGS Catalog resource:

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgscatalog_utils-0.5.0.tar.gz (68.1 kB view details)

Uploaded Source

Built Distribution

pgscatalog_utils-0.5.0-py3-none-any.whl (87.5 kB view details)

Uploaded Python 3

File details

Details for the file pgscatalog_utils-0.5.0.tar.gz.

File metadata

  • Download URL: pgscatalog_utils-0.5.0.tar.gz
  • Upload date:
  • Size: 68.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for pgscatalog_utils-0.5.0.tar.gz
Algorithm Hash digest
SHA256 6f50092b12c4a046ab8a807329a41e03ba701ba54a7b60c71a26964a12e97cbd
MD5 e2eb1519bd0f16773b57de19508a170d
BLAKE2b-256 70ea507b3dc796d96b89ae47e04976e23cb6734260148e3b74235f919e6be6fd

See more details on using hashes here.

File details

Details for the file pgscatalog_utils-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pgscatalog_utils-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 87.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for pgscatalog_utils-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f598bd1fe66e262082b2040d0505cd5c6cffb1fa52b4ccca61abfe85f2f4987
MD5 b83cf254b187a14b0d549dc3da352b88
BLAKE2b-256 48ceffa6fa2948dd58c22a69932bead42eff93b40cf9265ac439835d52debd97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page