Skip to main content

Utilities and datasets for deep learning in genomics

Project description

Janggu is a python package that facilitates deep learning in the context of genomics. The package is freely available under a GPL-3.0 license.

Janggu motivation

In particular, the package allows for easy access to typical Genomics data formats and out-of-the-box evaluation so that you can concentrate on designing the neural network architecture for the purpose of quickly testing biological hypothesis. A comprehensive documentation is available here.

Hallmarks of Janggu:

  1. Janggu provides special Genomics datasets that allow you to access raw data in FASTA, BAM, BIGWIG, BED and GFF file format.
  2. Various normalization procedures are supported for dealing with of the genomics dataset, including ‘TPM’, ‘zscore’ or custom normalizers.
  3. The dataset are directly consumable with neural networks implemented in keras.
  4. Numpy format output of a keras model can be converted to represent genomic coverage tracks, which allows exporting the predictions as BIGWIG files and visualization of genome browser-like plots.
  5. Genomic datasets can be stored in various ways, including as numpy array, sparse dataset or in hdf5 format.
  6. Caching of Genomic datasets avoids time consuming preprocessing steps and facilitates fast reloading.
  7. Janggu provides a wrapper for keras models with built-in logging functionality and automatized result evaluation.
  8. Janggu provides a special keras layer for scanning both DNA strands for motif occurrences.
  9. Janggu provides keras models constructors that automatically infer input and output layer shapes to reduce code redundancy.
  10. Janggu provides a web application that allows to browse through the results.

Why the name Janggu?

Janggu is a Korean percussion instrument that looks like an hourglass.

Like the two ends of the instrument, the philosophy of the Janggu package is to help with the two ends of a deep learning application in genomics, namely data acquisition and evaluation.

Installation

The simplest way to install janggu is via the conda package management system. Assuming you have already installed conda, create a new environment and type

pip install janggu

The janggu neural network model depends on tensorflow which you have to install depending on whether you want to use GPU support or CPU only. To install tensorflow type

conda install tensorflow  # or tensorflow-gpu

Further information regarding the installation of tensorflow can be found on the official tensorflow webpage

To verify that the installation works try to run the example contained in the janggu package as follows

git clone https://github.com/BIMSBbioinfo/janggu
cd janggu
python ./src/examples/classify_fasta.py single

Changelog

0.8.4 (2018-12-11)

  • Updated installation instructions in the readme

0.8.3 (2018-12-05)

  • Fixed issues for loading SparseGenomicArray
  • Made GenomicIndexer.filter_by_region aware of flank
  • Fixed BedLoader of partially overlapping ROI and bedfiles issue using filter_by_region.
  • Adapted classifier, license and keywords in setup.py
  • Fixed hyperlinks

0.8.2 (2018-12-04)

  • Bugfix for zero-padding functionality
  • Added ndim for keras compatibility

0.8.1 (2018-12-03)

  • Bugfix in GenomicIndexer.create_from_region

0.8.0 (2018-12-02)

  • Improved test coverage
  • Improved linter issues
  • Bugs fixed
  • Improved documentation for scorers
  • Removed kwargs for scorers and exporters
  • Adapted exporters to classes

0.7.0 (2018-12-01)

  • First public version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for janggu, version 0.8.4
Filename, size File type Python version Upload date Hashes
Filename, size janggu-0.8.4-py2.py3-none-any.whl (1.3 MB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size janggu-0.8.4.tar.gz (1.7 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page