DNA repeat annotations
Project description
DeepGRP is a python package used to predict genomic repetitive elements with a deep learning model consisting of bidirectional gated recurrent units with attention. The idea of DeepGRP was initially based on dna-nn, but was re-implemented and extended using TensorFlow 2.1. DeepGRP was tested for the prediction of HSAT2,3, alphoid, Alu and LINE-1 elements.
Getting Started
Installation
For installation you can use the PyPI version with:
pip install deepgrp
or install from this repository with:
git clone https://github.com/fhausmann/deepgrp cd deepgrp pip install .
Additionally you can install the developmental version with poetry:
git clone https://github.com/fhausmann/deepgrp cd deepgrp poetry install
Data preprocessing
For training and hyperparameter optimization the data have to be preprocessed. For inference / prediction the FASTA sequences can directly be used and you can skip this process. The provided script parse_rm can be used to extract repeat annotations from RepeatMasker annotations to a TAB seperated format by:
parse_rm GENOME.fa.out > GENOME.bed
The FASTA sequences have to be converted to a one-hot-encoded representation, which can be done with:
preprocess_sequence FASTAFILE.fa.gz
preprocess_sequence creates a one-hot-encoded representation in numpy compressed format in the same directory.
Hyperparameter optimization
For Hyperparameter optimization the github repository provides a jupyter notebook which can be used.
Hyperparameter optimization is based on the hyperopt package.
Training
Training of a model can be performed with the provided jupyter notebook.
Prediction
The prediction can be done with the deepgrp main function like:
deepgrp <modelfile> <fastafile> [<fastafile>, ...]
where <modelfile> contains the trained model in HDF5 format and <fastafile> is a (multi-)FASTA file containing DNA sequences. Several FASTA files can be given at once.
Requirements
Requirements are listed in pyproject.toml.
Additionally for compiling C/Cython code, a C compiler should be installed.
Further information
You can find material to reproduce the results in the repository deepgrp_reproducibility.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.