Skip to main content

DNA repeat annotations

Project description

PyPI version fury.io

DeepGRP is a python package used to predict genomic repetitive elements with a deep learning model consisting of bidirectional gated recurrent units with attention. The idea of DeepGRP was initially based on dna-nn, but was re-implemented and extended using TensorFlow 2.1. DeepGRP was tested for the prediction of HSAT2,3, alphoid, Alu and LINE-1 elements.

Getting Started

Installation

For installation you can use the PyPI version with:

pip install deepgrp

or install from this repository with:

git clone https://github.com/fhausmann/deepgrp
cd deepgrp
pip install .

Additionally you can install the developmental version with poetry:

git clone https://github.com/fhausmann/deepgrp
cd deepgrp
poetry install

Data preprocessing

For training and hyperparameter optimization the data have to be preprocessed. For inference / prediction the FASTA sequences can directly be used and you can skip this process. The provided script parse_rm can be used to extract repeat annotations from RepeatMasker annotations to a TAB seperated format by:

parse_rm GENOME.fa.out > GENOME.bed

The FASTA sequences have to be converted to a one-hot-encoded representation, which can be done with:

preprocess_sequence FASTAFILE.fa.gz

preprocess_sequence creates a one-hot-encoded representation in numpy compressed format in the same directory.

Hyperparameter optimization

For Hyperparameter optimization the github repository provides a jupyter notebook which can be used.

Hyperparameter optimization is based on the hyperopt package.

Training

Training of a model can be performed with the provided jupyter notebook.

Prediction

The prediction can be done with the deepgrp main function like:

deepgrp <modelfile> <fastafile> [<fastafile>, ...]

where <modelfile> contains the trained model in HDF5 format and <fastafile> is a (multi-)FASTA file containing DNA sequences. Several FASTA files can be given at once.

Requirements

Requirements are listed in pyproject.toml.

Additionally for compiling C/Cython code, a C compiler should be installed.

Further information

You can find material to reproduce the results in the repository deepgrp_reproducibility.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepgrp-0.2.1.tar.gz (24.8 kB view details)

Uploaded Source

File details

Details for the file deepgrp-0.2.1.tar.gz.

File metadata

  • Download URL: deepgrp-0.2.1.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.3 Linux/5.11.16-300.fc34.x86_64

File hashes

Hashes for deepgrp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a13b5ddbdb165bcf8b3d5acb5955b8a015172062cb74b94584f4387d5c0a711c
MD5 5eb0b29670625f83799ce7e0ead09e31
BLAKE2b-256 acf87e2764cb78c0774663955eba13dcd9564880153577e8bb8b2ec9ae4f8688

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page