Skip to main content

Utilities for working with PGS Catalog API and scoring files

Project description

PGS Catalog utilities

CI DOI

This repository is a collection of useful tools for downloading and working with scoring files from the PGS Catalog. This is mostly used internally by the PGS Catalog Calculator (PGScatalog/pgsc_calc); however, other users may find some of these tools helpful.

Overview

  • download_scorefiles: Download scoring files by PGS ID (accession) in genome builds GRCh37 or GRCh38
  • combine_scorefile: Combine multiple scoring files into a single scoring file in 'long' format
  • match_variants: Match target variants (bim or pvar files) against the output of combine_scorefile to produce scoring files for plink 2
  • ancestry_analysis : use genetic PCA loadings to compare samples to population reference panels, and report PGS adjusted for these axes of genetic ancestry. The PCs will likely have been generated with FRAPOSA (pgs catalog version)
  • validate_scorefiles: Check/validate that the scoring files and harmonized scoring files match the PGS Catalog scoring file formats.

Installation

$ pip install pgscatalog-utils

Quickstart

$ download_scorefiles -i PGS000922 PGS001229 -o . -b GRCh37
$ combine_scorefiles -s PGS*.txt.gz -o combined.txt 
$ match_variants -s combined.txt -t <example.pvar> --min_overlap 0.75 --outdir .
$ validate_scorefiles -t formatted --dir <scoringfiles_directory> --log_dir <logs_directory>

More details are available using the --help parameter.

Install from source

Requirements:

$ git clone https://github.com/PGScatalog/pgscatalog_utils.git
$ cd pgscatalog_utils
$ poetry install
$ poetry build
$ pip install --user dist/*.whl 

Credits

The pgscatalog_utils package is developed as part of the Polygenic Score (PGS) Catalog (www.PGSCatalog.org) project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert, Laurent Gil) and the European Bioinformatics Institute (Helen Parkinson, Aoife McMahon, Ben Wingfield, Laura Harris).

A manuscript describing the tool and larger PGS Catalog Calculator pipeline (PGSCatalog/pgsc_calc) is in preparation. In the meantime if you use these tools we ask you to cite the repo(s) and the paper describing the PGS Catalog resource:

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgscatalog_utils-0.4.0.tar.gz (66.2 kB view details)

Uploaded Source

Built Distribution

pgscatalog_utils-0.4.0-py3-none-any.whl (85.3 kB view details)

Uploaded Python 3

File details

Details for the file pgscatalog_utils-0.4.0.tar.gz.

File metadata

  • Download URL: pgscatalog_utils-0.4.0.tar.gz
  • Upload date:
  • Size: 66.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for pgscatalog_utils-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f2eb55afcc332a710581c1064d5db54184a3acb7c67c79c0e85a12bef89dcf96
MD5 13e4b2adbe5a6e95ece79d9eca994283
BLAKE2b-256 dbc269ae4e122ebb4b446789cd9adf3f92838c55ca8ca06e91a3b75d8c9275ca

See more details on using hashes here.

File details

Details for the file pgscatalog_utils-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pgscatalog_utils-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 85.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for pgscatalog_utils-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4009771bdf613eb5440c7fae453e8311195665908193f7b13a173c7aead87cef
MD5 f9330673cfedab66d702e2a1a492d10b
BLAKE2b-256 e6e0dd31026ae742eb03f31bdfcd5ba16609e8ae1ee29ccd212ce481d3529312

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page