CLI tool to run a PheWAS

Project description

PyPheWAS package

A python script I use to run PheWAS analyses. Full Documentation can be found here: online docs

Summary

This repository contains a CLI tool implemented in python that can be used to run a PheWAS analysis. This script supports both PheCode 1.2 and PheCode X (read about each here). This package is based on the PheTK package but offers flexibility in the model that I wanted and has a more verbose output by reporting the betas and standard errors for all predictors. The PyPhewas-package supports both logistic and linear regression. Additionally, this package will use Firth Regression when a perfect separation error is encountered in the logistic model.

Installation

This code is hosted on PYPI and can be installed using a package manager such as conda or pip. If using Pip it is recommended to first make a virtualenv using venv and then installing the program into the virtual environment. Use the following commands to install the program.

python3 -m venv pyphewas-venv

source pyphewas-venv/bin/activate

pip install pyphewas-package

conda create -n pyphewas_env python=3.13 -y

conda activate pyphewas_env

pip install pyphewas-package

If you would like to install the PyPheWAS Package from source it is available on Github. It is recommended to use PDM to install the project. To install the PyPheWAS package from source using PDM, run the following command:

pdm install

If you want to install the program from source code without PDM then you must first install the necessary dependencies from the pyproject.toml file using pip. Then you can call the source file which is located at './src/pyphewas/run_PheWAS.py'

Required Inputs

--counts: filepath to a comma separated file where each row has a ID, a phecode id, and the number of times that individual has that phecode in their medical record.
--covariate-file: filepath to a comma separated file that lists the covariates and predictor for each individual. The individuals listed in the covariate file will be the individuals in the cohort. Note If the 'flip-predictor-and-outcome' flag is used then the predictor variable is assumed to be the outcome in the model.
--covariate-list: Space separated list of covariates to use in the model. All of these covariates must be present in the covariate file and must be spelled exactly the same otherwise the code will crash.
--phecode-version: String telling which version of phecodes to use. This argument helps with mapping the PheCode ID to a description. The allowed values are "phecodeX", "phecode1.2", and "phecodeX_who". Most users will only need to use either the PhecodeX or Phecode1.2 option.

Optional Inputs

Although these arguments are not required for runtime, some combination of them will generally be used to make the analysis either more rigorous, more robust, or more fine tuned for the exact question being asked.

--min-phecode-count: Minimum number of phecodes an individual is required to have in order to be considered a case for a phecode. Default value is 2. Under default settings, all individuals with 1 occurrence of the phecode are excluded from the regression. If this value is set to 1 then there are no excluded individuals.
--min-case-count: Minimum number of cases a phecode has to have to be included in the analysis. The default value is 20. There is no rigorous testing behind this value, only convention. For more rigorous results, a more conservative value of 100 may be ideal.
--status-col: column name for the column in the covariate file that has the predictor case/control status. Default value is "status"
--sample-col: column name for the column in the covariates file that has the individual ids. Default value is "person_id"
--output: filename to write the output to. The output will be written as a tab separated file. If the suffix of the file ends in gz then the file will be gzipped otherwise the file will be uncompressed. Default value is test_output.txt
--phecode-descriptions: filepath to a comma separated file that lists the phecode ID and the corresponding phecode name. There are default description files stored in the './src/phecode_maps/' folder if you wish to see example files that are currently used in the code. The phecode ID is expected to be the first column while the phecode description is expected to be the 4th column.
--cpus: Number of cpus to use during the analysis. Default value is 1.
--max-iterations: Number of iterations for the regression to try to converge. If the model doesn't converge after reaching the max iteration threshold then a ConvergenceWarning will be thrown. If you run this code and find that many PheCodes are not converging then it is recommended to increase this value to attempt to get more phecodes to converge. Default value is 200
--flip-predictor-and-outcome: Depending on the analysis, you may want the status column in the covariate file to be a predictor or to be the outcome. If you want the status to be the outcome then you can supply this flag as '--flip-predictor-and-outcome'. When the status is the outcome, then the case/control status for the individual phecodes will become the predictor.
--run-sex-specific: Depending on the analysis, you may also want to restrict the analysis to a sex stratified cohort. This command is one of three flags that have to be used in tandem that allow you to stratify the analysis. Allowed values are 'male-only' and 'female-only'.
--male-as-one: If the '--run-sex-specific' flag is used then this flag also has to be passed indicating if males were coded as 1 and females as 0 or vice versa. You could pass this flag as '--male-as-one' to indicate that males were coded as 1. The default value is True although this flag will be ignored if the '--run-sex-specific' flag is not provided.
--sex-col: Column name of the column in the covariate file containing Sex or Gender information. This flag is required if the '--run-sex-specific' flag was used. Values should be coded numerically as 0 or 1.
--model: Whether to run a linear model or a logistic model for the regression. Default value is 'logistic'. Allowed values are 'linear' and 'logistic'.
--firth-max-iterations: Maximum number of iterations to try for firth regression model to converge. Default value is 50.

Example Command

Non sex stratified with parallelization:

pyphewas \
    --counts counts.csv \
    --covariate-file covariates.csv \
    --min-phecode-count 2 \
    --status-col status \
    --sample-col person_id \
    --covariate-list EHR_GENDER age unique_phecode_count \
    --min-case-count 100 \
    --cpus 25 \
    --output output.txt.gz \
    --phecode-version phecodeX

Sex Stratified with parallelization:

pyphewas \
    --counts counts.csv \
    --covariate-file covariates.csv \
    --min-phecode-count 2 \
    --status-col status \
    --sample-col person_id \
    --covariate-list age unique_phecode_count \
    --min-case-count 100 \
    --cpus 25 \
    --output output.txt.gz \
    --phecode-version phecodeX \
    --flip-predictor-and-outcome \
    --run-sex-specific female-only \
    --male-as-one True \
    --sex-col EHR_GENDER

note on parallelization: Generally using logistic regression is faster than the linear model. This observation is also true in this package. The logistic model is faster and more memory efficient than the linear model. In testing the linear model, each "process" (defined as each CPU in the commandline arguments) used between 10-16 GB of RAM and the total process took ~60 minutes. The logistic model ran on 30 GB of RAM total with 15 CPUs over 30 minutes. Both of these comparisons were run for a set of ~1.6 million individuals. You can test how the linear model we perform on you machine by just running it with 2 cpus for about 250 phecodes and seeing what the memory is for each python process.

Project details

Release history Release notifications | RSS feed

This version

0.5.0 yanked

Jan 13, 2026

0.4.4

Jan 22, 2026

0.4.3

Jan 12, 2026

0.4.2

Dec 21, 2025

0.4.2b3 pre-release

Dec 20, 2025

0.4.2b2 pre-release

Dec 20, 2025

0.4.2b1 pre-release

Dec 20, 2025

0.4.2a2 pre-release

Dec 18, 2025

0.4.2a1 pre-release

Dec 18, 2025

0.4.1

Dec 13, 2025

0.4.1a1 pre-release

Dec 18, 2025

0.4.0

Dec 13, 2025

0.3.1

Dec 11, 2025

0.3.1b0 pre-release

Dec 9, 2025

0.3.0b0 pre-release

Dec 8, 2025

0.3.0a0 pre-release

Jul 23, 2025

0.2.1a0 pre-release

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyphewas_package-0.5.0.tar.gz (2.1 MB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyphewas_package-0.5.0-py3-none-any.whl (1.3 MB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file pyphewas_package-0.5.0.tar.gz.

File metadata

Download URL: pyphewas_package-0.5.0.tar.gz
Upload date: Jan 13, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.26.3 CPython/3.10.10 Darwin/24.6.0

File hashes

Hashes for pyphewas_package-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`410d2a281ff1cdcdf150fdf9f918929498a94d46a480306b749334e0b7896a4c`
MD5	`fb782201ff9b0b122944062ce2c8c09c`
BLAKE2b-256	`c792131202bbd7b949676ff2f7af2c84da216aadcd3e088110a47de002b4bfd1`

See more details on using hashes here.

File details

Details for the file pyphewas_package-0.5.0-py3-none-any.whl.

File metadata

Download URL: pyphewas_package-0.5.0-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 1.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.26.3 CPython/3.10.10 Darwin/24.6.0

File hashes

Hashes for pyphewas_package-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96781a99658c7bd5d297c2891b55202755b5de1591d339de3bdc3c96b7ef2fc4`
MD5	`2bd18ee822ffa9a3b4f041fd847e182c`
BLAKE2b-256	`fe14bb0b3729789c4fd03a59231220b042c6e6fc97a58cb7fe8888dca6e6421d`

See more details on using hashes here.

pyphewas-package 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PyPheWAS package

Summary

Installation

Required Inputs

Optional Inputs

Example Command

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes