Skip to main content

No project description provided

Project description

EIR-auto-GP

EIR auto GP Logo

Documentation Status


EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.

WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes.

NOTE: This project is specifically for genomic prediction. For more general and configurable deep learning tasks, please refer to EIR.

Overview

EIR-auto-GP is a comprehensive framework for genomic prediction (GP) tasks, built on top of the EIR deep learning framework. EIR-auto-GP streamlines the process of preparing data, training, and evaluating models on genomic data, automating much of the process from raw input files to results analysis. Key features include:

  • Support for .bed/.bim/.fam PLINK files as input data.
  • Automated data processing and train/test splitting.
  • Takes care of launching a configurable number of deep learning training runs.
  • SNP-based feature selection based on GWAS, deep learning-based attributions, and a combination of both.
  • Ensemble prediction from multiple training runs.
  • Analysis and visualization of results.

Installation

First, ensure that plink2 is installed and available in your PATH.

Then, install EIR-auto-GP using pip:

pip install eir-auto-gp

Important: The latest version of EIR-auto-GP supports Python 3.12. Using an older version of Python will install a outdated version of EIR-auto-GP, which likely be incompatible with the current documentation and might contain bugs. Please ensure that you are installing EIR-auto-GP in a Python 3.12 environment.

Usage

Please refer to the Documentation for examples and information.

Workflow

The rough workflow can be visualized as follows:

EIR auto GP Workflow

  1. Data processing: EIR-auto-GP processes the input .bed/.bim/.fam PLINK files and .csv label file, preparing the data for model training and evaluation.
  2. Train/test split: The processed data is automatically split into training and testing sets, with the option of manually specifying splits.
  3. Training: Configurable number of training runs are set up and executed using EIR's deep learning models.
  4. SNP feature selection: GWAS based feature selection, deep learning-based feature selection with Bayesian optimization, and mixed strategies are supported.
  5. Test set prediction: Predictions are made on the test set using all training run folds.
  6. Ensemble prediction: An ensemble prediction is created from the individual predictions.
  7. Results analysis: Performance metrics, visualizations, and analysis are generated to assess the model's performance.

Citation

If you use EIR-auto-GP in a scientific publication, we would appreciate if you could use one of the following citations:

@article{10.1093/nar/gkad373,
    author    = {Sigurdsson, Arn{\'o}r I and Louloudis, Ioannis and Banasik, Karina and Westergaard, David and Winther, Ole and Lund, Ole and Ostrowski, Sisse Rye and Erikstrup, Christian and Pedersen, Ole Birger Vesterager and Nyegaard, Mette and DBDS Genomic Consortium and Brunak, S{\o}ren and Vilhj{\'a}lmsson, Bjarni J and Rasmussen, Simon},
    title     = {{Deep integrative models for large-scale human genomics}},
    journal   = {Nucleic Acids Research},
    month     = {05},
    year      = {2023}
}

@article{sigurdsson2024non,
  title={Non-linear genetic regulation of the blood plasma proteome},
  author={Sigurdsson, Arnor I and Gr{\"a}f, Justus F and Yang, Zhiyu and Ravn, Kirstine and Meisner, Jonas and Thielemann, Roman and Webel, Henry and Smit, Roelof AJ and Niu, Lili and Mann, Matthias and others},
  journal={medRxiv},
  pages={2024--07},
  year={2024},
  publisher={Cold Spring Harbor Laboratory Press}
}

@article{sigurdsson2022improved,
    author    = {Sigurdsson, Arnor Ingi and Ravn, Kirstine and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
    title     = {Improved prediction of blood biomarkers using deep learning},
    journal   = {medRxiv},
    pages     = {2022--10},
    year      = {2022},
    publisher = {Cold Spring Harbor Laboratory Press}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eir_auto_gp-0.2.1.tar.gz (99.9 kB view details)

Uploaded Source

Built Distribution

eir_auto_gp-0.2.1-py3-none-any.whl (122.8 kB view details)

Uploaded Python 3

File details

Details for the file eir_auto_gp-0.2.1.tar.gz.

File metadata

  • Download URL: eir_auto_gp-0.2.1.tar.gz
  • Upload date:
  • Size: 99.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.7 Linux/5.15.0-1074-azure

File hashes

Hashes for eir_auto_gp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e730f92ee5d538568ce7b3e11e4660e06401d835fa6d6c0ecef2c9408dc9197f
MD5 b2737b2c7b7c50757d51fe7b2e399614
BLAKE2b-256 1ac11778ae9bb89ec48a7fc4e66f64f6f5f8ace7c1aa40eab8db5fc0958b36df

See more details on using hashes here.

File details

Details for the file eir_auto_gp-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: eir_auto_gp-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 122.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.7 Linux/5.15.0-1074-azure

File hashes

Hashes for eir_auto_gp-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 efa2ee88e009d19c54b7dcbfa1cfb049f9dd5e43c424967721b8c6a741cd2882
MD5 4198ce467ac32b21e545b44812d1b01b
BLAKE2b-256 d70e091dccec04d84e886a5fe5989d0fb89979cec4a167412f0b35bfe4c07026

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page