Skip to main content

Predicting pathogenic potentials of novel DNA with reverse-complement neural networks.

Project description

DeePaC

DeePaC is a python package for predicting labels (e.g. pathogenic potentials) from short DNA sequences (e.g. Illumina reads) with reverse-complement neural networks. For details, see our preprint on bioRxiv: https://www.biorxiv.org/content/10.1101/535286v2.

Documentation can be found here: https://rki_bioinformatics.gitlab.io/DeePaC/.

Installation

With conda

You can install DeePaC with conda, and use it an a python package, or a CLI tool. Note that the TensorFlow team recommends using virtualenv and pip (see below). Set up the bioconda channel first, and remember to activate your conda environment before using DeePaC:

conda create -c bioconda -n my_env
conda activate my_env
conda install deepac

With pip

You can install DeePaC with pip, and use it an a python package, or a CLI tool. Remember to activate your virtual environment before using DeePaC:

virtualenv --system-site-packages my_env
source my_env/bin/activate
pip install deepac

GPU support

To use GPUs, you need to reinstall TensorFlow. It is easy in conda:

conda remove tensorflow
conda install tensorflow-gpu

If you're using pip, you need to install CUDA and CuDNN first (see TensorFlow installation guide for details). Then you can do the same as above:

pip uninstall tensorflow
pip install tensorflow-gpu

Help

To see help, just use

deepac --help
deepac predict --help
deepac train --help
# Etc.

Prediction

You can predict pathogenic potentials with one of the built-in models out of the box:

# A rapid CNN (trained on IMG/M data)
deepac predict -r input.fasta
# A sensitive LSTM (trained on IMG/M data)
deepac predict -s input.fasta
# With GPU support
deepac predict -s -g 1 input.fasta

The rapid and the sensitive models are trained to predict pathogenic potentials of novel bacterial species. For details, see https://www.biorxiv.org/content/10.1101/535286v2.

To quickly filter your data according to predicted pathogenic potentials, you can use:

deepac predict -r input.fasta
deepac filter input.fasta input_predictions.npy -t 0.5

Note that after running predict, you can use the input_predictions.npy to filter your fasta file with different thresholds. You can also add pathogenic potentials to the fasta headers in the output files:

deepac filter input.fasta input_predictions.npy -t 0.75 -p -o output-75.fasta
deepac filter input.fasta input_predictions.npy -t 0.9 -p -o output-90.fasta

Preprocessing

For more complex analyzes, it can be useful to preprocess the fasta files by converting them to binary numpy arrays. Use:

deepac preproc preproc_config.ini

See the config_templates directory of the GitLab repository (https://gitlab.com/rki_bioinformatics/DeePaC/) for a sample configuration file.

Evaluation

To evaluate a trained model, use

# Read-by-read performance
deepac eval -r eval_config.ini
# Species-by-species performance
deepac eval -s eval_species_config.ini
# Ensemble performance
deepac eval -e eval_ens_config.ini

See the configs directory for sample configuration files. Note that deepac eval -s requires precomputed predictions and a csv file with a number of DNA reads for each species in each of the classes.

Training

To train a new model, use

deepac train nn_train_config.ini

If you train an LSTM on a GPU, a CUDNNLSTM implementation will be used. To convert the resulting model to be CPU-compatible, use deepac convert. You can also use it to save the weights of a model, or recompile a model from a set of weights to use it with a different Python binary.

Dependencies

DeePaC requires Tensorflow, Keras, Biopython, Scikit-learn and matplotlib. Python 3.4+ is supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepac-0.9.1.tar.gz (34.7 MB view details)

Uploaded Source

Built Distribution

deepac-0.9.1-py3-none-any.whl (34.7 MB view details)

Uploaded Python 3

File details

Details for the file deepac-0.9.1.tar.gz.

File metadata

  • Download URL: deepac-0.9.1.tar.gz
  • Upload date:
  • Size: 34.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for deepac-0.9.1.tar.gz
Algorithm Hash digest
SHA256 dc6976b26bb46d8b1f725ab1202e3862ea405d3c664df4cc3a487d4f66d6cdd8
MD5 736b6037a65c3914c82095f3f7f1b9f2
BLAKE2b-256 1ba3846d8f6c77beabf246642cbb6bbf370dc785529f810fe82c9e19b779c488

See more details on using hashes here.

File details

Details for the file deepac-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: deepac-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 34.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for deepac-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b1d502a4e52c5388cc4b4a35e98b2a44681e3e683d18f870c3b9bd3d784a5129
MD5 41facb6a517f92243732d95443ce60fe
BLAKE2b-256 3338090b7be1f7741b0e0168fce96a828a0b3eacbff92d070ad3358f3d83af72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page