Polygenic score toolkit

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- Free for non-commercial use
Operating System
- MacOS
- Unix
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Project description

polygenic - the polygenic scores toolkit

Basic info

![Maintainer]

Downloads

pip
docker with data
docker without data

Index

Summary
Diplotyping Algorithm
Installation
Quick start guide
Manual
- Tools
- Docker images
- Building models
- Example models
- Usecases
  - pgx
License
Updates

Summary

Polygenic is a toolkit for a wide range of polygenic scores analysis tasks. The most important use cases include computing scores for samples in vcf files, building scores for GWAS results or fetching scores from repositories.

Diplotyping Algorithm

We begin by reading individual genetic variants (genotypes) from the patient's VCF file, where each genotype carries two alleles — one per chromosome. The system supports phasing using a custom reference panel to resolve which alleles sit on the same chromosome; when phased data is available, the algorithm preserves this linkage information, while for unphased data both chromosomes are treated symmetrically. We define haplotypes as specific combinations of co-inherited variants that form recognized gene versions, such as pharmacogenomic star alleles, where each haplotype definition distinguishes core defining variants (weighted at 1.0) from supportive sub-lineage variants (weighted at 0.05). The algorithm scores every candidate haplotype against the patient's alleles, then selects the best-matching haplotype for the first chromosome — keeping all candidates within a 2% margin of the top score. The matched alleles are then "claimed" by that haplotype, and the remaining unmatched alleles (leftovers) are passed into a second round, where up to 100 candidate haplotypes are re-scored against only those residual alleles to identify the second haplotype. Each first/second haplotype pair is ranked by combined match percentage and filtered by total missing data, producing the final diplotype call — e.g., CYP2D6 *1/*4. Every variant in the result carries a source label — direct genotyping, LD proxy, imputation flag from VCF, allele-frequency-based imputation, reference, or missing — enabling granular quality control and full traceability of the diplotype call.

Missing variant handling (`--ref-fallback`)

By default, variants called ./. in the input VCF are treated as missing (source: "missing") — they are subtracted from both the numerator and the denominator of the haplotype match score, so they neither support nor penalize any candidate. This matches the approach taken by PharmCAT's Named Allele Matcher, which explicitly drops missing positions rather than imputing them: "If the sample data has missing positions that are required by a named allele definition, the position will be dropped from consideration."

Passing --ref-fallback restores the legacy behavior of filling ./. with homozygous reference (source: "reference"). This is almost always the wrong default for pharmacogenomic panels. The canonical example is CYP2C19: the variant rs3758581 at chr10:94842866 has GRCh38 reference A, but G is the major allele in every population and PharmVar v5 defines every "real" star allele (*1, *2, *17 …) as requiring G at this position, reserving CYP2C19*38 (= old *1.001) for the minority carrying the reference A (see PharmVar GeneFocus: CYP2C19). If rs3758581 is not on the panel and the code fills it with the reference, every sample drifts toward *38/*38. Default off avoids this trap. Enable --ref-fallback only when your panel's ./. genuinely means "0/0 and I chose not to spell it out" — for example when re-running test fixtures authored under the old default.

Match-confidence gate (`--top-n`)

After scoring all candidate haplotype pairs, the algorithm returns a haplotype_id when either of these holds:

the best pair's per-chromosome max_percent_match is ≥ 50% (high-confidence call), or
the total scored-candidate pool size is ≤ top_n (default 15) — a small, non-ambiguous candidate space is itself a confidence signal.

Otherwise haplotype_id is None (the caller refuses to guess). This is more conservative than PharmCAT, which always returns the top-scoring diplotype with deterministic tie-breaking, and roughly comparable to Aldy 4, which reports "no more than three diplotypes" when it cannot fully disambiguate. Tune --top-n higher (laxer) or lower (stricter) depending on how much ambiguity is acceptable downstream. Set --top-n 0 to rely solely on the 50% threshold.

CYP2C19*38 worked example

On a clinical panel that does not assay rs3758581, running pgs-compute with the CYP2C19 PharmVar model on four real samples (three carrying non-reference CYP2C19 variants, one all-reference) produces these calls:

Sample	Non-ref CYP2C19 variants	Default (no `--ref-fallback`)	Legacy (`--ref-fallback`)
Sample A	none	`1.001/1.001` (≡ 1/1)	`1.001/1.001`
Sample B	rs12769205, rs4244285 (het)	`2/1.001` ✓	`1.001/1.001` ✗
Sample C	rs12248560 (hom)	`17/17` ✓	`1.001/1.001` ✗
Sample D	rs12769205, rs4244285 (het)	`2/1.001` ✓	`1.001/1.001` ✗

With the legacy --ref-fallback on, the missing rs3758581 becomes A/A (genomic reference), which does not match the G required by *1, *2, *17 etc., so every sample collapses to the empty-haplotype *1.001 (equivalent to *38 in PharmVar v5 terms). With the new default, rs3758581 stays missing, the scoring ignores it, and the panel's actual informative variants drive the call.

Installation

With pip

Install for user account

python3 -m pip install --upgrade polygenic

Install globally

sudo -H python3 -m pip install polygenic

With conda

Run conda image

docker run -it conda/miniconda3 /bin/bash

Create python3.8 environment and install polygenic

yes | conda create --name py38 python=3.8
eval "$(conda shell.bash hook)"
conda activate py38
### should be 3.8
python --version

### gcc is missing to build pytabix
apt -qq update
apt -y install build-essential tabix

pip install polygenic

With docker

Large image with all data included

docker run intelliseq:polygenictk:2.1.0 *command*

Thin image with just polygenic package installed

docker run intelliseq:polygenic:2.1.0 *command*

Quick start guide

mkdir polygenic && cd polygenic # create working directory
wget https://downloads.intelliseq.com/public/polygenic/gbe-INI78-bone-density.yml # download model
wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz # download genotypes
wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz.tbi # download position index
wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz.idx.db # download rsid index
docker run -v $(pwd):/data intelliseq/polygenic:latest --vcf /data/illu_merged-imputed.vcf.gz --model /data/gbe-INI78-bone-density.yml --output-directory /data # compute model

Manual

Tools

pgs-compute

usage: pgstk [-h] -i VCF [-m MODEL [MODEL ...]] [-p PARAMETERS] [-s SAMPLE_NAME] [-o OUTPUT_DIRECTORY] [-n OUTPUT_NAME_APPENDIX] [-l LOG_FILE] [--af AF] [--af-field AF_FIELD]
             [-v] [--print]

pgs-compute computes polygenic scores for genotyped sample in vcf format

optional arguments:
  -h, --help            show this help message and exit
  -i, --vcf VCF         vcf.gz file with genotypes
  -m, --model MODEL [MODEL ...]
                        path to .yml model (can be specified multiple times with space as separator)
  -p, --parameters PARAMETERS
                        parameters json (to be used in formula models)
  -s, --sample-name SAMPLE_NAME
                        sample name in vcf.gz to calculate
  -o, --output-directory OUTPUT_DIRECTORY
                        output directory
  -n, --output-name-appendix OUTPUT_NAME_APPENDIX
                        appendix for output file names
  -l, --log-file LOG_FILE
                        path to log file
  --af AF               vcf file containing allele freq data
  --af-field AF_FIELD   name of the INFO field to be used as allele frequency
  -v, --version         show program's version number and exit
  --print               Print output to stdout

Arguments

Required

--vcf vcf.gz file with genotypes (tabix index should be available)
--model path to model file

Optional

--log_file log file
--out_dir directory for result jsons
--population population code
--models_path path to a directory containing models
--af an indexed vcf.gz file containing allele freq data
--version prints version of package

Building models in yml

Index: Model structure Model types Parameters

Model structure

Core structure

Models have two properties which is model and description. model is a specification of computation to be performed and description is additional information to be included in the result.

model:
description:

Object keys

Each object that is not collection has a set of predefined keys (required or optional) that can be used for computation. For example: diplotype_model object has a required diplotypes key.

diplotype_model:
  diplotypes:

The computation is first delegated to key specified objects and later aggregated by the top level object itself.

Collections

There is special category of objects that don't have predefined keys but are collections. Each key within collection becomes element of collection. Collections are easy to recognize, because they are specified in plural form like diplotypes or variants. Each element of collection will be defined as singular object of collection type. For example key in variants collection will becomes objects of variant type.

      variants:
        rs7041: {diplotype: C/C}
        rs4588: {diplotype: T/T}

Variants

Variants can be identified by rsid. Variant value will be computed basing on information provided: diplotype or effect_allele. Accepted sets of fields are:

diplotypes
- diplotype
- symbol
score
- effect_allele
- effect_size
- symbol

Model types

There are currently implemented four types of models:

score_model
diplotype_model
haplotype_model
formula_model The type of model can be specified at the top of yml structure or within the model field.

Specification of model type at the top of yml structure

diplotype_model:
description:

Specification of model type within the `model` field

model:
  diplotype_model:
description:

Parameters

External parameters can be used in formula_model through @parameters keyword.
Example parameters file in .json format:

{"sex": "F"}

Path to file can be provided as argument to polygenic tool:

--parameters /path/to/parameters.json

Example of use of parameters in the formula_model:

formula_model:
  formula:
    value: "@female.score_model.value if @parameters.sex == 'F' else @male.score_model.value"
  male:
    score_model:
      variants:
        ...
  female:
    score_model:
      variants:

Example models

Example diplotype model

This example diplotype model is based on Randolph 2014.

diplotype_model:
  diplotypes:
    1/1:
      variants:
        rs7041: {diplotype: C/C}
        rs4588: {diplotype: T/T}
    1/1s:
      variants:
        rs7041: {diplotype: C/C}
        rs4588: {diplotype: T/G}
    1/1f:
      variants:
        rs7041: {diplotype: C/A}
        rs4588: {diplotype: T/G}
    1/2:
      variants:
        rs7041: {diplotype: C/A}
        rs4588: {diplotype: T/T}
    1s/1s:
      variants:
        rs7041: {diplotype: C/C}
        rs4588: {diplotype: G/G}
    1s/1f:
      variants: 
        rs7041: {diplotype: C/A}
        rs4588: {diplotype: G/G}
    1s/2:
      variants: 
        rs7041: {diplotype: C/A}
        rs4588: {diplotype: G/T}
    1f/1f: 
      variants: 
        rs7041: {diplotype: A/A}
        rs4588: {diplotype: G/G}
    1f/2: 
      variants: 
        rs7041: {diplotype: A/A}
        rs4588: {diplotype: G/T}
    2/2: 
      variants: 
        rs7041: {diplotype: A/A}
        rs4588: {diplotype: T/T}
description:
  pmid: 24447085
  genes: [GC]
  result_diplotype_choice:
    1/1: Moderate
    1/1s: High
    1/1f: High
    1/2: Low
    1s/1s: Very high
    1s/1f: Very high
    1s/2: Moderate
    1f/1f: Very high
    1f/2: Moderate
    2/2: Very low

Example haplotype model

Haplotype model can be used for HLA and PGx.
To define haplotype models a list of alleles is required (called variants in this case, to be consistent with othe rypes of models). Each allele has associated list of defining mutations (alternative SNV alles) defined by Gnomad ID along with ref, alt and effect_allele properties. One star allele should be empty (containing only reference SNV alleles). The algorithm will utilised any phasing information in the vcf.

haplotype_model:
  variants:
    CYP2D6*1.001:
    CYP2D6*1.002:
      22-42126963-C-T: {ref: "C", alt: "T", effect_allele: "T"}
    CYP2D6*1.003:
      22-42128813-G-A: {ref: "G", alt: "A", effect_allele: "A"}
    CYP2D6*1.004:
      22-42128216-G-T: {ref: "G", alt: "T", effect_allele: "T"}
    CYP2D6*1.005:
      22-42128922-A-G: {ref: "A", alt: "G", effect_allele: "G"}
    CYP2D6*1.006:
      22-42129726-A-C: {ref: "A", alt: "C", effect_allele: "C"}
      22-42129950-A-C: {ref: "A", alt: "C", effect_allele: "C"}
      22-42130482-C-A: {ref: "C", alt: "A", effect_allele: "A"}

For copy-number star alleles (CYP2D6 *5/*1xN, CYP2C19 *36/*37) the model gains a copy_number: block and structural: haplotypes — see docs/pgx-cnv.md for the VCF contract and YAML schema.

Example score model with categories rescaling

score_model:
  variants:
    rs10012: {effect_allele: G, effect_size: 0.369215857410143}
    rs1014971: {effect_allele: T, effect_size: 0.075546961392531}
    rs10936599: {effect_allele: C, effect_size: 0.086359830674748}
    rs11892031: {effect_allele: C, effect_size: -0.552841968657781}
    rs1495741: {effect_allele: A, effect_size: 0.05307844348342}
    rs17674580: {effect_allele: C, effect_size: 0.187520720836463}
    rs2294008: {effect_allele: T, effect_size: 0.08278537031645}
    rs798766: {effect_allele: T, effect_size: 0.093421685162235}
    rs9642880: {effect_allele: G, effect_size: 0.093421685162235}
  categories:
    High risk: {from: 1.371624087, to: 2.581880425, scale_from: 2, scale_to: 3}
    Potential risk: {from: 1.169616034, to: 1.371624087, scale_from: 1, scale_to: 2}
    Average risk: {from: -0.346748358, to: 1.169616034, scale_from: 0, scale_to: 1}
    Low risk: {from: -1.657132197, to: -0.346748358, scale_from: -1, scale_to: 0}
description:
  about: 
  genes: []
  result_statement_choice:
    Average risk: Avg
    Potential risk: Pot
    High risk: Hig
    Low risk: Low
  science_behind_the_test:
  test_type: Polygenic Risk Score
  trait: Breast cancer
  trait_authors:
    - taken from the PGS catalog
  trait_copyright: Intelliseq all rights reserved
  trait_explained: None
  trait_heritability: None
  trait_pgs_id: PGS000001
  trait_pmids:
    - 25855707
  trait_snp_heritability: None
  trait_title: Breast_Cancer
  trait_version: 1.0
  what_you_can_do_choice:
    Average risk:
    High risk:
    Low risk:
  what_your_result_means_choice:
    Average risk:
    High risk:
    Low risk:

Example Formula Model

formula_model:
  formula:
    brownexp: "math.exp(@brown.score_model.value - 2.0769)"
    redexp: "math.exp(@red.score_model.value - 6.3953)"
    blackexp: "math.exp(@black.score_model.value - 2.4029)"
    sumexp: "@brownexp + @redexp + @blackexp"
    brown_prob: "@brownexp / (1 + @sumexp)"
    red_prob: "@redexp / (1 + @sumexp)"
    black_prob: "@blackexp / (1 + @sumexp)"
    blonde_prob: "1 - (@brown_prob + @red_prob + @black_prob)"
  brown:
    score_model:
      variants:
        rs796296176: {effect_allele: CA, effect_size: 1.2522}
        rs11547464: {effect_allele: A, effect_size: -0.61155}
        rs885479: {effect_allele: T, effect_size: 0.2937}
        rs1805008: {effect_allele: T, effect_size: -0.50143}
        rs1805005: {effect_allele: T, effect_size: 0.21172}
        rs1805006: {effect_allele: A, effect_size: 1.9293}
        rs1805007: {effect_allele: T, effect_size: -0.32318}
        rs1805009: {effect_allele: C, effect_size: 0.60861}
        rs1805009: {effect_allele: A, effect_size: 0.25624}
        rs2228479: {effect_allele: A, effect_size: -0.054143}
        rs1110400: {effect_allele: C, effect_size: -0.56315}
        rs28777: {effect_allele: C, effect_size: 0.52168}
        rs16891982: {effect_allele: C, effect_size: 0.75284}
        rs12821256: {effect_allele: G, effect_size: -0.34957}
        rs4959270: {effect_allele: A, effect_size: -0.19171}
        rs12203592: {effect_allele: T, effect_size: 1.6475}
        rs1042602: {effect_allele: T, effect_size: 0.16092}
        rs1800407: {effect_allele: A, effect_size: -0.19111}
        rs2402130: {effect_allele: G, effect_size: 0.35821}
        rs12913832: {effect_allele: T, effect_size: 1.214}
        rs2378249: {effect_allele: C, effect_size: 0.12669}
        rs683: {effect_allele: C, effect_size: 0.21172}
  red:
    score_model:
      variants:
        rs796296176: {effect_allele: CA, effect_size: 25.508}
        rs11547464: {effect_allele: A, effect_size: 2.5381}
        rs885479: {effect_allele: T, effect_size: -0.20889}
        rs1805008: {effect_allele: T, effect_size: 2.801}
        rs1805005: {effect_allele: T, effect_size: 0.93493}
        rs1805006: {effect_allele: A, effect_size: 3.65}
        rs1805007: {effect_allele: T, effect_size: 3.4408}
        rs1805009: {effect_allele: C, effect_size: 4.5868}
        rs1805009: {effect_allele: A, effect_size: 22.107}
        rs2228479: {effect_allele: A, effect_size: 0.62307}
        rs1110400: {effect_allele: C, effect_size: 1.4453}
        rs28777: {effect_allele: C, effect_size: 0.70401}
        rs16891982: {effect_allele: C, effect_size: -0.41869}
        rs12821256: {effect_allele: G, effect_size: -0.57964}
        rs4959270: {effect_allele: A, effect_size: 0.24861}
        rs12203592: {effect_allele: T, effect_size: 0.90233}
        rs1042602: {effect_allele: T, effect_size: 0.45003}
        rs1800407: {effect_allele: A, effect_size: -0.27606}
        rs2402130: {effect_allele: G, effect_size: 0.28313}
        rs12913832: {effect_allele: T, effect_size: -0.093776}
        rs2378249: {effect_allele: C, effect_size: 0.76634}
        rs683: {effect_allele: C, effect_size: -0.053427}
  black:
    score_model:
      variants:
        rs796296176: {effect_allele: CA, effect_size: 2.732}
        rs11547464: {effect_allele: A, effect_size: -16.969}
        rs885479: {effect_allele: T, effect_size: 0.39983}
        rs1805008: {effect_allele: T, effect_size: -0.86062}
        rs1805005: {effect_allele: T, effect_size: -0.0029013}
        rs1805006: {effect_allele: A, effect_size: -16.088}
        rs1805007: {effect_allele: T, effect_size: -1.3757}
        rs1805009: {effect_allele: C, effect_size: 0.060631}
        rs1805009: {effect_allele: A, effect_size: 3.9824}
        rs2228479: {effect_allele: A, effect_size: 0.17012}
        rs1110400: {effect_allele: C, effect_size: 0.29143}
        rs28777: {effect_allele: C, effect_size: 0.82228}
        rs16891982: {effect_allele: C, effect_size: 1.1617}
        rs12821256: {effect_allele: G, effect_size: -0.89824}
        rs4959270: {effect_allele: A, effect_size: -0.36359}
        rs12203592: {effect_allele: T, effect_size: 1.997}
        rs1042602: {effect_allele: T, effect_size: 0.065432}
        rs1800407: {effect_allele: A, effect_size: -0.49601}
        rs2402130: {effect_allele: G, effect_size: 0.26536}
        rs12913832: {effect_allele: T, effect_size: 1.9391}
        rs2378249: {effect_allele: C, effect_size: -0.089509}
        rs683: {effect_allele: C, effect_size: 0.15796}
description:
  name: HirisPlex

Description

Model keys glossary

model - generic model that can aggregate results of other model types
diplotype_model Required keys:
- diplotypes
description - all properties to be included in the final results

Usecases

PGX

python3 -m pip install polygenic
pgstk pgs-compute --vcf [PATH_TO_VCF_GZ] --model cyp2d6-pharmvar.yml --print | jq .haplotype_model.haplotypes.match

License

Proprietary (contact@intelliseq.pl)

Updates

2.5.1

PACKAGING: render the README on the PyPI project page (long_description from README.md, text/markdown).

2.5.0

FEATURE: copy-number (CNV) star-allele calling for CYP2D6 (*5, *1xN) and CYP2C19 (*36/*37). polygenic consumes copy number resolved by an upstream caller (symbolic <DEL>/<DUP> ALT + range + FORMAT/CN + phase) and emits non-diploid calls (*5/*5, *1/*5, *1/*4x2). It does no depth/ratio math — allele-specific duplication side is read from phase; unphased input yields *a/*b (CNx) + allele_specific_unphased. See docs/pgx-cnv.md.
FEATURE: haplotype_model gains a copy_number: block, structural: haplotype definitions (scope: whole|partial), and a CYP2D6-only multiplication: block.
TEST: structural VcfRecord unit tests and CNV deletion/duplication/regression integration tests (test_vcfrecord_structural.py, test_haplotypemodel_cnv.py).

2.4.0

FEATURE: --ref-fallback flag (opt-in; default off). ./. genotypes now stay source="missing" by default — matches PharmCAT behavior and fixes spurious CYP2C19*38 calls on panels that don't assay rs3758581.
FEATURE: --top-n flag (default 15). Haplotype caller returns a call when either the match ≥50% OR the scored candidate pool is ≤top_n, enabling calls on sparse panels without ambiguity risk.
BUG: ScoreModel.compute no longer KeyErrors on adjusted_score when a variant has no af field (model.py:549).
BUG: Diplotype QC now correctly counts variant sources (previously miscounted every variant as "missing" due to structural mismatch in compute_qc).
BUG: ScoreModel handles models without args section (previously AttributeError on self.get("args").get("prevalence")).
TEST: Added PGx truth-set integration tests against 1000 Genomes phased GRCh38 slices (NA12878, NA18507, NA19240 × {CYP2C19, CYP2C9, CYP2B6, CYP2D6, SLCO1B1}); rebuildable via scripts/build_pgx_fixtures.py.
TEST: Added PharmVar-named haplotype tests and NA18507 CYP2B6/SLCO1B1 regression tests.

2.3.17

BUG: resolved bug with low weight for missing genotypes

2.3.16

BUG: resolved bug with weight of genotypes in haplotypes

2.3.15

BUG: resolved bug with not enough haplotypes to check

2.3.14

BUG: resolved bug with wrong leftover genotypes

2.3.12

FEATURE: added gene names to genotypes if available

2.3.11

BUG: resolved bug with wrong genotype sources counts

2.3.10

BUG: resolved bug with missing genotype sources counts

2.3.9

FEATURE: add reference as a genotyping source

2.3.8

BUG: resolved bugs inside mobigen wdl task

2.3.7

FEATURE: added ldproxy imputation source

2.3.6

BUG: resolved bug with missing polars package after installation

2.3.5

BUG: resolved bug with 'type' object is not subscriptable running pgstk

2.3.4

BUG: resolved bug with where model does not provide to or from category fields

2.3.3

BUG: resolved bug with missing pyarrow package after installation

2.3.2

BUG: renamed jpg to jpeg outputs from vcfstat

2.3.1

BUG: resolved bug with missing importlib-resources package after installation

2.3.0

FEATURE: added vcf stat tool for zygosities
FEATURE: added vcf stat tool for baf computation

2.2.15

UPDATE: updated parsing for new version of pan biobankuk
DEV: updated numpy version to 1.23.4

2.2.14

FEATURE: added module for ldproxy imputing
FEATURE: added option for merging output as an array instead of dictionary in pgs-compute

2.2.13

BUG: resolved bug with missing score in haplotype model
DEV: cleaned up test resources

2.2.12

BUG: resolved bug with empty argument in executable

2.2.11

BUG: resolved bug with naming of multiple models in one file

2.2.10

DOC: improved diploty model documentation

2.2.9

BUG: missing effect allele in diplotyp models

2.2.8

BUG: imputed source is based on IMP tag in the INFO field or GT:DS in format field

2.2.7

BUG: repaired bug with missing math library in eval

2.2.6

FEATURE: added qc to model results

2.2.5

ENHACEMENT: libraries updates

2.2.0

ENHANCEMENT: better computing of haplotype models. First one haplotype is identified and further the second haplotype is identified from leftover genotypes
ENHANCEMENT: moved argparse from tools to pgstk

2.1.10

BUG: resolved bug with wrong plink.clumped path in clumping

2.1.9

BUG: resolved bug with missing index in biobankuk model

2.1.8

BUG: resolved bug with biobankuk model for codenames with special characters

2.1.7

BUG: resolved bug with haplotype model where none of haplotypes matched genotype. Most probable genotype is provided

2.1.6

DOC: added docker badges
FEATURE: added posibility to output all pgs results in one json file --merge-outputs
FEATURE: added category to diplotype model
FEATURE: added caching in genotyping module

2.1.5

BUG: biobankuk model output files now contain only alphanumeric characters
BUG: biobankuk model code names with special characters are now being downloaded

2.1.4

FEATURE: added model_name and sample_name to description

2.1.3

FEATURE: added support for multiple models in pgs-compute
FEATURE: added missing variants count to haplotype in haplotype model
BUG: id field in haplotype model

2.1.2

FEATURE: allow gnomadid for variant in yml models
FEATURE: added printing output option in pgs-compute

2.1.1

BUG: resolved NoneType bug with empty haplotype

2.1.0

FEATURE: haplotype model now works with phased data

2.0.0

FEATURE: switched to yaml model definitions
FEATURE: implemented formula, score, haplotype and diplotype model types
FEATURE: added gene symbols to description
DEVOPS: prepared docker image with resources for building models

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- Free for non-commercial use
Operating System
- MacOS
- Unix
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

2.5.1

Jun 10, 2026

2.5.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polygenic_pgx-2.5.1.tar.gz (77.4 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polygenic_pgx-2.5.1-py3-none-any.whl (77.7 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file polygenic_pgx-2.5.1.tar.gz.

File metadata

Download URL: polygenic_pgx-2.5.1.tar.gz
Upload date: Jun 10, 2026
Size: 77.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for polygenic_pgx-2.5.1.tar.gz
Algorithm	Hash digest
SHA256	`8729554f4077667703289f32f9931a1aebda104497fba9e4b543bd1358a7ef30`
MD5	`dc3fd90171c29ef86872b9c383fd0280`
BLAKE2b-256	`f63f64afe27c9db9d96435a331016e669a432f5dc85ae66f40ed8e70c3025960`

See more details on using hashes here.

File details

Details for the file polygenic_pgx-2.5.1-py3-none-any.whl.

File metadata

Download URL: polygenic_pgx-2.5.1-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 77.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for polygenic_pgx-2.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`850f61bdf3fe3b60a5ae5c618d4918e9542b7656e2debd1a9035b408eed7745e`
MD5	`83025d2a7349c2b99c9a768a6e11082b`
BLAKE2b-256	`bdaaf330e4f9538c1574aa7b349ff13a920e4d283ac5eee9ac6a5269fef1e320`

See more details on using hashes here.

polygenic-pgx 2.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

polygenic - the polygenic scores toolkit

Basic info

Downloads

Index

Summary

Diplotyping Algorithm

Missing variant handling (--ref-fallback)

Match-confidence gate (--top-n)

CYP2C19*38 worked example

Installation

With pip

Install for user account

Install globally

With conda

With docker

Large image with all data included

Thin image with just polygenic package installed

Quick start guide

Manual

Tools

pgs-compute

Arguments

Required

Optional

Building models in yml

Model structure

Core structure

Object keys

Collections

Variants

Model types

Specification of model type at the top of yml structure

Specification of model type within the model field

Parameters

Example models

Example diplotype model

Example haplotype model

Example score model with categories rescaling

Example Formula Model

Description

Model keys glossary

Usecases

PGX

License

Updates

2.5.1

2.5.0

2.4.0

2.3.17

2.3.16

2.3.15

2.3.14

2.3.12

2.3.11

2.3.10

2.3.9

2.3.8

2.3.7

2.3.6

2.3.5

2.3.4

2.3.3

2.3.2

2.3.1

2.3.0

2.2.15

2.2.14

2.2.13

2.2.12

2.2.11

2.2.10

Missing variant handling (`--ref-fallback`)

Match-confidence gate (`--top-n`)

Specification of model type within the `model` field