Skip to main content

Synthetic rule-based biological sequence data generation for architecture evaluation and search

Project description

seqgra: Principled Selection of Neural Network Architectures for Genomics Prediction Tasks

license: MIT Travis build status

PyPI version badge placeholder

DOI badge placeholder

https://kkrismer.github.io/seqgra/

What is seqgra?

Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process. The method can be used to (1) generate data under the assumption of a hypothesized model of genome regulation, (2) identify neural network architectures capable of recovering the rules of said model, and (3) analyze a model's predictive performance as a function of training set size, noise level, and the complexity of the rules behind the simulated data.

Installation

seqgra is a Python package that is part of both conda-forge and PyPI, the package repositories behind conda and pip, respectively.

To install seqgra with conda, run:

conda install -c conda-forge seqgra

To install seqgra with pip, run:

pip install seqgra

To install seqgra directly from this repository, run:

git clone https://github.com/gifford-lab/seqgra
cd seqgra
pip install .

System requirements

  • Python 3.7 (or higher)
  • R 3.5 (or higher)
    • R package ggplot2 3.3.0 (or higher)
    • R package gridExtra 2.3 (or higher)
    • R package scales 1.1.0 (or higher)

The tensorflow package is only required if TensorFlow models are used and will not be automatically installed by pip install seqgra. Same is true for packages torch and pytorch-ignite, which are only required if PyTorch models are used.

R is a soft dependency, in the sense that it is used to create a number of plots (grammar-model-agreement plots, grammar heatmaps, and motif similarity matrix plots) and if not available, these plots will be skipped.

seqgra depends upon the Python package lxml, which in turn depends on system libraries that are not always present. On a Debian/Ubuntu machine you can satisfy those requirements using:

sudo apt-get install libxml2-dev libxslt-dev

Usage

Check out the following help pages:

Citation

If you use seqgra in your work, please cite:

seqgra: Principled Selection of Neural Network Architectures for Genomics Prediction Tasks
Konstantin Krismer, Jennifer Hammelman, and David K. Gifford
journal name TODO, Volume TODO, Issue TODO, date TODO, Page TODO; DOI: https://doi.org/TODO

Funding

We gratefully acknowledge funding from NIH grants 1R01HG008754 and 1R01NS109217.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqgra-0.0.2.tar.gz (114.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seqgra-0.0.2-py3-none-any.whl (177.0 kB view details)

Uploaded Python 3

File details

Details for the file seqgra-0.0.2.tar.gz.

File metadata

  • Download URL: seqgra-0.0.2.tar.gz
  • Upload date:
  • Size: 114.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.10

File hashes

Hashes for seqgra-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1a6367e10afdf9deb32449ada15c34c0945e17bc84e950b160a05d21d2a83873
MD5 708cd2168ab40f44e4103b1c45c63329
BLAKE2b-256 a44cdab2aece361bc46f09ce7d1426eeb2cb9b850ab64f084f9137a2c0a30d7c

See more details on using hashes here.

File details

Details for the file seqgra-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: seqgra-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 177.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.10

File hashes

Hashes for seqgra-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b511cce8b6e1772d8a6879706c6295fee58c33ae7269260108626f9ce6ee6c72
MD5 35e75b54e7238b48d9f94a3fb33918d9
BLAKE2b-256 6ed99a7b6b7320005b33ecf2d4b1b81f890dc172c0dd85d2cdd9024dae7027d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page