No project description provided

Project description

EIR-auto-GP

EIR auto GP Logo

EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.

WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes.

Overview

EIR-auto-GP is a comprehensive framework for genomic prediction (GP) tasks, built on top of the EIR deep learning framework. EIR-auto-GP streamlines the process of preparing data, training, and evaluating models on genomic data, automating much of the process from raw input files to results analysis. Key features include:

Support for .bed/.bim/.fam PLINK files as input data.
Automated data processing and train/test splitting.
Takes care of launching a configurable number of deep learning training runs.
SNP-based feature selection based on GWAS, deep learning-based attributions, and a combination of both.
Ensemble prediction from multiple training runs.
Analysis and visualization of results.

Installation

First, ensure that plink2 is installed and available in your PATH.

Then, install EIR-auto-GP using pip:

pip install eir-auto-gp

Important: The latest version of EIR-auto-GP supports Python 3.11. Using an older version of Python will install a outdated version of EIR-auto-GP, which likely be incompatible with the current documentation and might contain bugs. Please ensure that you are installing EIR-auto-GP in a Python 3.11 environment.

Usage

Please refer to the Documentation for examples and information.

Workflow

The rough workflow can be visualized as follows:

EIR auto GP Workflow

Data processing: EIR-auto-GP processes the input .bed/.bim/.fam PLINK files and .csv label file, preparing the data for model training and evaluation.
Train/test split: The processed data is automatically split into training and testing sets, with the option of manually specifying splits.
Training: Configurable number of training runs are set up and executed using EIR's deep learning models.
SNP feature selection: GWAS based feature selection, deep learning-based feature selection with Bayesian optimization, and mixed strategies are supported.
Test set prediction: Predictions are made on the test set using all training run folds.
Ensemble prediction: An ensemble prediction is created from the individual predictions.
Results analysis: Performance metrics, visualizations, and analysis are generated to assess the model's performance.

Citation

If you use EIR-auto-GP in a scientific publication, we would appreciate if you could use the following citation:

@article{sigurdsson2021deep,
  title={Deep integrative models for large-scale human genomics},
  author={Sigurdsson, Arnor Ingi and Westergaard, David and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
  journal={bioRxiv},
  year={2021},
  publisher={Cold Spring Harbor Laboratory}
}

Project details

Release history Release notifications | RSS feed

This version

0.0.6a0 pre-release

Jan 30, 2024

0.0.5a0 pre-release

Jan 29, 2024

0.0.4a0 pre-release

Jan 16, 2024

0.0.3a0 pre-release

Apr 28, 2023

0.0.2a0 pre-release

Apr 3, 2023

0.0.1a0 pre-release

Mar 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eir_auto_gp-0.0.6a0.tar.gz (58.0 kB view hashes)

Uploaded Jan 30, 2024 Source

Built Distribution

eir_auto_gp-0.0.6a0-py3-none-any.whl (71.5 kB view hashes)

Uploaded Jan 30, 2024 Python 3

Hashes for eir_auto_gp-0.0.6a0.tar.gz

Hashes for eir_auto_gp-0.0.6a0.tar.gz
Algorithm	Hash digest
SHA256	`6a9d6f6b50a7674fa3a747ede74ee62337ba9aaafe82584723404b776112be24`
MD5	`28bf4f2d2abc1d0a61963b80fce1c7b1`
BLAKE2b-256	`83e477360b7f7c5a38be5ddf0273158ab8a50f9ad4d05206784845fd9625c526`

Hashes for eir_auto_gp-0.0.6a0-py3-none-any.whl

Hashes for eir_auto_gp-0.0.6a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a2abbe6d817160931b4ff28e1a31fe2615ded05652f97e722c29b26dd4cd69e3`
MD5	`084747bdcfe3f20a73218e549d6996b5`
BLAKE2b-256	`6c45ee3731307e93c8d5c2281db0c1172b43ca16a192264ca0b60af7d9f20360`