Skip to main content

Utilities for working with PGS Catalog API and scoring files

Project description

PGS Catalog utilities


This repository is a collection of useful tools for downloading and working with scoring files from the PGS Catalog. This is mostly used internally by the PGS Catalog Calculator (PGScatalog/pgsc_calc); however, other users may find some of these tools helpful.


  • download_scorefiles: Download scoring files by PGS ID (accession) in genome builds GRCh37 or GRCh38
  • combine_scorefile: Combine multiple scoring files into a single scoring file in 'long' format
  • match_variants: Match target variants (bim or pvar files) against the output of combine_scorefile to produce scoring files for plink 2
  • validate_scorefiles: Check/validate that the scoring files and harmonized scoring files match the PGS Catalog scoring file formats.


$ pip install pgscatalog-utils


$ download_scorefiles -i PGS000922 PGS001229 -o . -b GRCh37
$ combine_scorefiles -s PGS*.txt.gz -o combined.txt 
$ match_variants -s combined.txt -t <example.pvar> --min_overlap 0.75 --outdir .
$ validate_scorefiles -t formatted --dir <scoringfiles_directory> --log_dir <logs_directory>

More details are available using the --help parameter.

Install from source


$ git clone
$ cd pgscatalog_utils
$ poetry install
$ poetry build
$ pip install --user dist/*.whl 


The pgscatalog_utils package is developed as part of the Polygenic Score (PGS) Catalog ( project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert, Laurent Gil) and the European Bioinformatics Institute (Helen Parkinson, Aoife McMahon, Ben Wingfield, Laura Harris).

A manuscript describing the tool and larger PGS Catalog Calculator pipeline (PGSCatalog/pgsc_calc) is in preparation. In the meantime if you use these tools we ask you to cite the repo(s) and the paper describing the PGS Catalog resource:

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgscatalog_utils-0.3.1.tar.gz (49.4 kB view hashes)

Uploaded source

Built Distribution

pgscatalog_utils-0.3.1-py3-none-any.whl (63.7 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page