Skip to main content

Utilities for working with PGS Catalog API and scoring files

Project description

PGS Catalog utilities

CI DOI

This repository is a collection of useful tools for downloading and working with scoring files from the PGS Catalog. This is mostly used internally by the PGS Catalog Calculator (PGScatalog/pgsc_calc); however, other users may find some of these tools helpful.

Overview

  • download_scorefiles: Download scoring files by PGS ID (accession) in genome builds GRCh37 or GRCh38
  • combine_scorefile: Combine multiple scoring files into a single scoring file in 'long' format
  • match_variants: Match target variants (bim or pvar files) against the output of combine_scorefile to produce scoring files for plink 2
  • ancestry_analysis : use genetic PCA loadings to compare samples to population reference panels, and report PGS adjusted for these axes of genetic ancestry. The PCs will likely have been generated with FRAPOSA (pgs catalog version)
  • validate_scorefiles: Check/validate that the scoring files and harmonized scoring files match the PGS Catalog scoring file formats.

Installation

$ pip install pgscatalog-utils

Quickstart

$ download_scorefiles -i PGS000922 PGS001229 -o . -b GRCh37
$ combine_scorefiles -s PGS*.txt.gz -o combined.txt 
$ match_variants -s combined.txt -t <example.pvar> --min_overlap 0.75 --outdir .
$ validate_scorefiles -t formatted --dir <scoringfiles_directory> --log_dir <logs_directory>

More details are available using the --help parameter.

Install from source

Requirements:

$ git clone https://github.com/PGScatalog/pgscatalog_utils.git
$ cd pgscatalog_utils
$ poetry install
$ poetry build
$ pip install --user dist/*.whl 

Credits

The pgscatalog_utils package is developed as part of the Polygenic Score (PGS) Catalog (www.PGSCatalog.org) project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert, Laurent Gil) and the European Bioinformatics Institute (Helen Parkinson, Aoife McMahon, Ben Wingfield, Laura Harris).

A manuscript describing the tool and larger PGS Catalog Calculator pipeline (PGSCatalog/pgsc_calc) is in preparation. In the meantime if you use these tools we ask you to cite the repo(s) and the paper describing the PGS Catalog resource:

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgscatalog_utils-0.5.3.tar.gz (68.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgscatalog_utils-0.5.3-py3-none-any.whl (88.0 kB view details)

Uploaded Python 3

File details

Details for the file pgscatalog_utils-0.5.3.tar.gz.

File metadata

  • Download URL: pgscatalog_utils-0.5.3.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.13 Linux/6.5.0-1016-azure

File hashes

Hashes for pgscatalog_utils-0.5.3.tar.gz
Algorithm Hash digest
SHA256 f4ead24831b965fcea8ef29fe2aad1afbc4ee114e81e21f31223ca93ee2e6dbf
MD5 8fd260e117b38050fa5dd2b7fe0b5e82
BLAKE2b-256 95ee9becc92164cfc7879db99fb8c5102432d640544c8c9c1b00b799a9c8b04d

See more details on using hashes here.

File details

Details for the file pgscatalog_utils-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: pgscatalog_utils-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 88.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.13 Linux/6.5.0-1016-azure

File hashes

Hashes for pgscatalog_utils-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bb17a1da3d73e5c691c0fbc15ca055263d56961b964a7c4e24a26fb293072cbd
MD5 6678af634aea1f21be2d8856d5cb8622
BLAKE2b-256 76fbe6f4117d2ca57950fa553251317f432838de46e6639eac47818ac5336638

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page