Build keras generator for genomic application

These details have not been verified by PyPI

Project links

Homepage

Project description

Keras_dna: simplifying deep genomics

Keras_dna logo

Description:

Keras_dna is an API that helps quick experimentation in applying deep learning to genomics. It enables to quickly feed a keras model (tensorflow) with genomic data without the need of laborious file convertion and storing tremendous amount of converted data. It reads the most commun bioinformatics file types and create a generator adapted to a keras model.

Use Keras_dna if you need a library that:

Allows fast usage of standard bioinformatics data to feed a keras model (nowaday standard for tensorflow).
Is able to adapt to the needed format of data.
Facilitates the standard evaluation of a model with genomics data (correlation, AUPRC, AUROC)

Read the documentation at keras_dna.

Keras_dna is compatible with: Python 3.6.

Guiding principles:

Fournishing a simplified API to create generator of genomical data.
Reading the DNA sequence directly and effectively in fasta file to discard the need of storing huge amounts of data.
Generating the DNA sequence corresponding to the desired annotation (can be sparse annotation or continuous), passed with standard bioinformatic files (gff, bed, bigWig, bedGraph).
Easily adapt to the type of annotation, their number, the number of different cell type or species.

Getting started:

The core data structures of Keras_dna are a generator, to feed the keras model with genomical data, and a modelwrapper to attach a keras model to its keras_dna generator.

Generator is able to create batch of DNA sequence corresponding to the desired annotation.

First example, a Generator that will return DNA subsequences corresponding to a given function (here binding site) as positive class and subsequences far away as negative class. The DNA sequence is fournished through a fasta file and the annotation is fournished with a gff file (could have been a bed), the DNA is one-hot-encoded, the function names that we want to target need to be passed in a list.

from keras_dna import Generator

generator = Generator(batch_size=64,
                      fasta_file='species.fa',
                      annotation_files=['annotation.gff'],
                      annotation_list=['binding site'])

Second example, a Generator for continuous annotation, this time the file is a bigWig file (it can also be passed with a wig or a bedGraph, but then a file containing the size of chromosome need to be passed as size), the length of desired window need to be passed. This generator will generate all the window of length 100 in the DNA and will label it with the coverage at the center nucleotid.

from keras_dna import Generator

generator = Generator(batch_size=64,
                      fasta_file='species.fa',
                      annotation_files=['annotation.bw'],
                      window=100)

Generator owns a lot of keywords to adapt the format of the data both to the keras model and to our task (predicting the sequence function in different cellular type, choosing between several different functions, adding a secondary input, adding a secondary target...)

ModelWrapper is a class designed to unify a keras model to its generator in order to simplify further usage of the model (prediction, evaluation).

from keras_dna import ModelWrapper, Generator
from tensorflow.keras.models import Sequential()

generator = Generator(batch_size=64,
                      fasta_file='species.fa',
                      annotation_files=['annotation.bw'],
                      window=100)

model = Sequential()
### the model need to be compiled
model.compile(loss='mse', optimizer='adam')

wrapper = ModelWrapper(model=model,
                       generator_train=generator)

Train the model with .train()

wrapper.train(epochs=10)

Evaluate the model on a chromosome with .evaluate()

wrapper.evaluate(incl_chromosomes=['chr1'])

Predict on a chromosome with .predict()

wrapper.predict(incl_chromosomes=['chr1'], chrom_size='species.chrom.sizes')

Save the wrapper in hdf5 with .save()

wrapper.save(path='./path/to/wrapper', save_model=True)

Installation:

Dependencies:

pandas
numpy
pybedtools
pyBigWig
kipoiseq
tensorflow 2

We also strongly advice to install genomelake for fast reading in fasta file.

Install Keras_dna from PyPI:

Note: These installation steps assume that you are on a Linux or Mac environment. If you are on Windows, you will need to remove sudo to run the commands below.

sudo pip install keras_dna

If you are using a virtualenv, you may want to avoid using sudo:

pip install keras_dna

Note that libcurl (and the curl-config command) are required for installation. This is typically already installed on many Linux and OSX systems (this is also available easily if using a conda env).

Alternatively: install Keras_dna from the GitHub source:

First, clone Keras using git:

git clone https://github.com/etirouthier/keras_dna.git

Then, cd to the Keras_dna folder and run the install command:

cd keras_dna
sudo python setup.py install

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.46

Feb 18, 2021

0.0.45

Feb 17, 2021

0.0.44

Feb 16, 2021

0.0.43

Feb 16, 2021

0.0.42

Feb 15, 2021

0.0.41

Feb 12, 2021

0.0.40

Jan 27, 2021

0.0.39

Jan 25, 2021

0.0.38

Jan 25, 2021

0.0.37

Jan 25, 2021

0.0.36

Jan 5, 2021

0.0.35

Dec 21, 2020

0.0.34

Dec 17, 2020

0.0.33

Dec 17, 2020

0.0.32

Dec 17, 2020

0.0.31

Dec 17, 2020

0.0.30

Dec 15, 2020

0.0.29

Dec 14, 2020

0.0.28

Dec 14, 2020

0.0.27

Dec 14, 2020

0.0.26

Dec 14, 2020

0.0.25

Dec 11, 2020

0.0.24

Dec 11, 2020

0.0.23

Dec 11, 2020

0.0.22

Dec 4, 2020

0.0.21

Nov 30, 2020

0.0.20

Nov 30, 2020

0.0.19

Nov 23, 2020

0.0.18

Nov 20, 2020

0.0.17

Jul 24, 2020

0.0.16

Jun 30, 2020

0.0.15

May 28, 2020

This version

0.0.14

May 18, 2020

0.0.13

May 14, 2020

0.0.12

May 13, 2020

0.0.11

May 5, 2020

0.0.10

May 4, 2020

0.0.9

Apr 30, 2020

0.0.8

Apr 24, 2020

0.0.7

Apr 24, 2020

0.0.6

Apr 24, 2020

0.0.5

Apr 23, 2020

0.0.4

Apr 22, 2020

0.0.3

Apr 8, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keras_dna-0.0.14.tar.gz (32.0 kB view details)

Uploaded May 18, 2020 Source

Built Distribution

keras_dna-0.0.14-py3-none-any.whl (36.8 kB view details)

Uploaded May 18, 2020 Python 3

File details

Details for the file keras_dna-0.0.14.tar.gz.

File metadata

Download URL: keras_dna-0.0.14.tar.gz
Upload date: May 18, 2020
Size: 32.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for keras_dna-0.0.14.tar.gz
Algorithm	Hash digest
SHA256	`10cf1b5b2735583cc47bcab11d59fdf33e35c84cac93f9636f652fcc80fef4c3`
MD5	`7b771cc1a8557d3e364dd58500f186bf`
BLAKE2b-256	`5f6ee4f80f0746f2523bfa791882e884c4a6d26904bdafaa3041d95a00114cde`

See more details on using hashes here.

File details

Details for the file keras_dna-0.0.14-py3-none-any.whl.

File metadata

Download URL: keras_dna-0.0.14-py3-none-any.whl
Upload date: May 18, 2020
Size: 36.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for keras_dna-0.0.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70221364ce5cf9e9661a242eaf398196f42239d458db8a3e84a9228b71e3ca81`
MD5	`7bdf9960401373d6cd399531848f8c71`
BLAKE2b-256	`32dbcec2e85a5396e3580f65b9cae71a0f7a402505192dd366c1e7181721838a`