Predicting pathogenic potentials of novel DNA with reverse-complement neural networks.
Project description
DeePaC
DeePaC is a python package for predicting labels (e.g. pathogenic potentials) from short DNA sequences (e.g. Illumina reads) with reverse-complement neural networks. For details, see our preprint on bioRxiv: https://www.biorxiv.org/content/10.1101/535286v2.
Documentation can be found here: https://rki_bioinformatics.gitlab.io/DeePaC/.
Installation
With conda
You can install DeePaC with conda
, and use it an a python package, or a CLI tool. Note that the TensorFlow team
recommends using virtualenv
and pip
(see below). Set up the bioconda channel first, and remember to activate your conda environment before
using DeePaC:
conda create -c bioconda -n my_env
conda activate my_env
conda install deepac
With pip
You can install DeePaC with pip
, and use it an a python package, or a CLI tool.
Remember to activate your virtual environment before using DeePaC:
virtualenv --system-site-packages my_env
source my_env/bin/activate
pip install deepac
GPU support
To use GPUs, you need to reinstall TensorFlow. It is easy in conda:
conda remove tensorflow
conda install tensorflow-gpu
If you're using pip
, you need to install CUDA and CuDNN first (see TensorFlow installation guide for details). Then
you can do the same as above:
pip uninstall tensorflow
pip install tensorflow-gpu
Help
To see help, just use
deepac --help
deepac predict --help
deepac train --help
# Etc.
Prediction
You can predict pathogenic potentials with one of the built-in models out of the box:
# A rapid CNN (trained on IMG/M data)
deepac predict -r input.fasta
# A sensitive LSTM (trained on IMG/M data)
deepac predict -s input.fasta
# With GPU support
deepac predict -s -g 1 input.fasta
The rapid and the sensitive models are trained to predict pathogenic potentials of novel bacterial species. For details, see https://www.biorxiv.org/content/10.1101/535286v2.
To quickly filter your data according to predicted pathogenic potentials, you can use:
deepac predict -r input.fasta
deepac filter input.fasta input_predictions.npy -t 0.5
Note that after running predict
, you can use the input_predictions.npy
to filter your fasta file with different
thresholds. You can also add pathogenic potentials to the fasta headers in the output files:
deepac filter input.fasta input_predictions.npy -t 0.75 -p -o output-75.fasta
deepac filter input.fasta input_predictions.npy -t 0.9 -p -o output-90.fasta
Preprocessing
For more complex analyzes, it can be useful to preprocess the fasta files by converting them to binary numpy arrays. Use:
deepac preproc preproc_config.ini
See the config_templates
directory of the GitLab repository (https://gitlab.com/rki_bioinformatics/DeePaC/) for a sample configuration file.
Evaluation
To evaluate a trained model, use
# Read-by-read performance
deepac eval -r eval_config.ini
# Species-by-species performance
deepac eval -s eval_species_config.ini
# Ensemble performance
deepac eval -e eval_ens_config.ini
See the configs directory for sample configuration files. Note that deepac eval -s
requires precomputed predictions
and a csv file with a number of DNA reads for each species in each of the classes.
Training
To train a new model, use
deepac train nn_train_config.ini
If you train an LSTM on a GPU, a CUDNNLSTM implementation will be used. To convert the resulting model to be
CPU-compatible, use deepac convert
. You can also use it to save the weights of a model, or recompile a model
from a set of weights to use it with a different Python binary.
Dependencies
DeePaC requires Tensorflow, Keras, Biopython, Scikit-learn and matplotlib. Python 3.4+ is supported.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deepac-0.9.1.tar.gz
.
File metadata
- Download URL: deepac-0.9.1.tar.gz
- Upload date:
- Size: 34.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
dc6976b26bb46d8b1f725ab1202e3862ea405d3c664df4cc3a487d4f66d6cdd8
|
|
MD5 |
736b6037a65c3914c82095f3f7f1b9f2
|
|
BLAKE2b-256 |
1ba3846d8f6c77beabf246642cbb6bbf370dc785529f810fe82c9e19b779c488
|
File details
Details for the file deepac-0.9.1-py3-none-any.whl
.
File metadata
- Download URL: deepac-0.9.1-py3-none-any.whl
- Upload date:
- Size: 34.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
b1d502a4e52c5388cc4b4a35e98b2a44681e3e683d18f870c3b9bd3d784a5129
|
|
MD5 |
41facb6a517f92243732d95443ce60fe
|
|
BLAKE2b-256 |
3338090b7be1f7741b0e0168fce96a828a0b3eacbff92d070ad3358f3d83af72
|