Models nonlinear interactions between covariates and phenotypes
Project description
DeepNull: Modeling non-linear covariate effects improves phenotype prediction and association power
This repository contains code implementing nonlinear covariate modeling to increase power in genome-wide association studies, as described in "DeepNull: Modeling non-linear covariate effects improves phenotype prediction and association power" (Hormozdiari et al 2021). The code is written using Python 3.7 and TensorFlow 2.4.
Installation
Installation is not required to run DeepNull end-to-end; you can just
open DeepNull_e2e.ipynb
in colab
to try it out.
To install DeepNull locally, run
pip install --upgrade pip
pip install --upgrade deepnull
on a machine with Python 3.7+. This installs a CPU-only version, as there are typically few enough covariates that using accelerators does not provide meaningful speedups.
Verify that the installation is working properly by executing all tests:
python -m deepnull.config_test
python -m deepnull.data_test
python -m deepnull.metrics_test
python -m deepnull.main_test
python -m deepnull.model_test
python -m deepnull.train_eval_test
How to run DeepNull
To run locally, there is a single required input file. This file contains the phenotype of interest and covariates used to predict the phenotype, formatted as a tab-separated file suitable for GWAS analysis with PLINK or BOLT-LMM.
Briefly, the file must contain a single header line. The first two columns must
be FID
and IID
, and all IID
values must be unique.
An example command to train DeepNull to predict the phenotype pheno
from
covariates age
, sex
, and genotyping_array
is the following:
python -m deepnull.main \
--input_tsv=/input/YOUR_PHENOCOVAR_TSV \
--output_tsv=/output/YOUR_OUTPUT_TSV \
--target=pheno \
--covariates="age,sex,genotyping_array"
To see all available flags, run
python -m deepnull.main --help 2> /dev/null
Of particular note is the --model_config
flag. DeepNull uses the
ml_collections library to specify
all parameters related to the model and training regimen. The supported
configuration code is located in config.py
, and parameters can
be modified as described in detail in the
ml_collections README
.
As a brief example, to use the DeepNull architecture with the elu
activation
and train with batch size 4096, the above example command would be modified as
follows:
python -m deepnull.main \
--input_tsv=/input/ORIGINAL_PHENOCOVAR_TSV \
--output_tsv=/output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
--target=pheno \
--covariates="age,sex,genotyping_array" \
--model_config=/path/to/config.py:deepnull \
--model_config.model_config.mlp_activation=elu \
--model_config.training_config.batch_size=4096
where /path/to/config.py
provides the path to config.py
on your
machine.
Incorporating DeepNull into a GWAS analysis
The above section, "How to run DeepNull", shows that the DeepNull software adds
a single column to a phenotype+covariate file of interest that represents a
nonlinear prediction of the target phenotype of interest. To incorporate this
into a GWAS analysis, the single additional covariate should be added as an
additional covariate. A concrete example with BOLT-LMM
, using the same file,
phenotype pheno
, and covariates age
, sex
, genotyping_array
as above, is
shown below:
Original example GWAS command
# N.B. Data loading flags are omitted for brevity.
bolt \
--phenoFile /input/ORIGINAL_PHENOCOVAR_TSV \
--covarFile /input/ORIGINAL_PHENOCOVAR_TSV \
--qCovarCol age \
--qCovarCol sex \
--qCovarCol genotyping_array \
--phenoCol pheno
After running DeepNull on the /input/ORIGINAL_PHENOCOVAR_TSV
to create the new
TSV /output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV
that includes the column
pheno_deepnull
, the updated command is given below:
Updated GWAS command to incorporate DeepNull
# N.B. Data loading flags are omitted for brevity.
# Note the addition of the single `--qCovarCol pheno_deepnull` line.
bolt \
--phenoFile /output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
--covarFile /output/PHENOCOVAR_WITH_DEEPNULL_PREDICTION_TSV \
--qCovarCol age \
--qCovarCol sex \
--qCovarCol genotyping_array \
--qCovarCol pheno_deepnull \
--phenoCol pheno
Data
Datasets used to reproduce the results from the above publication are available to researchers with approved access to the UK Biobank.
NOTE: the content of this research code repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.
This is not an officially supported Google product.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file deepnull-0.2.2.tar.gz
.
File metadata
- Download URL: deepnull-0.2.2.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53027a6d3911ec5d5d61d634381082e26784783e1645ca6bcc3bfc3db4db4edf |
|
MD5 | 25273e1633aba2588bf8386ab6b919d3 |
|
BLAKE2b-256 | a4dc071d2d55e76ea8507a8b763732657c5c93af51f6d1a1cf610b182203e258 |
File details
Details for the file deepnull-0.2.2-py3.8.egg
.
File metadata
- Download URL: deepnull-0.2.2-py3.8.egg
- Upload date:
- Size: 66.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a87e5e6f4ceb580204c9fd489d53318da3147fa43421720815b22994d1f8551 |
|
MD5 | 6dce46673c3e90fad9ce12c84ca17d6e |
|
BLAKE2b-256 | 22053e7c225d73d228a8bbe6423fca952b92bc2b838c84398f2036d9370be3d0 |
File details
Details for the file deepnull-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: deepnull-0.2.2-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6137af922a311b039d5ed142abb1048ba99f85c26fb03a359cc67a3eca3f0d3 |
|
MD5 | 6fc50fe5587b244cf3d2491424488fbe |
|
BLAKE2b-256 | 63c0e49d68f73b09eb5dec4782aebdada03d2eb458607122c60be21e42622481 |