Skip to main content

Gnocis is a system for the analysis and the modelling of cis-regulatory DNA sequences.

Project description

Gnocis

Gnocis

Bjørn André Bredesen-Aa, 2018-

Stability


About

Large parts of complex genomes of multi-cellular organisms are non-coding. Cis-regulatory elements (CREs) are non-coding sequences that establish or modify gene transcription by multiple mechanisms. Multiple classes of CREs have been identified, including promoters, enhancers, silencers, insulators and Polycomb/Trithorax Response Elements. CREs can be identified experimentally or by means of in silico prediction. Experimental identification of CREs can depend on the cells that are used. Genome-wide in silico prediction, on the other hand, can potentially comprehensively predict CREs in a genome. In order to use machine learning for CRE prediction, a variety of functionality is required. A variety of packages exist for Python 3 for machine learning and sequence analysis, but successfully combining them requires the implementation of interfacing between them. Ensuring that the solution is efficient is important for large genomes, but can be challenging for end-users.

Gnocis is a system in Python 3 for the interactive and reproducible analysis and modelling of CRE DNA sequences. A broad suite of tools is implemented for data preparation, feature set definition, model formulation, training, cross-validation and genome-wide prediction. Gnocis employs Cython and a variety of techniques in order to optimally implement the glue necessary in order to apply machine learning for CRE analysis and prediction.


Installing

The recommended way to install Gnocis is through the PyPI package manager. In order to install via the PyPI package manager, open a terminal and execute:

pip install gnocis

Alternatively, Gnocis can be built from source. To build a wheel and install it, run:

make wheel
pip install dist/*.whl

Finally, Gnocis can be used by building from source and including the entire gnocis directory in the source tree. In order to do so, run

make all

Installing dependencies for tutorial

sudo apt-get install python3-sphinx
sudo pip3 install notebook pandas numpy matplotlib cupy sklearn tensorflow

Documentation

For the complete manual, see: https://bjornbredesen.github.io/gnocis/

For an in-depth tutorial, see the Jupyter Notebooks in the tutorial/ folder.


Features

  • DNA sequence handling
    • File format support - Loading and streaming
      • FASTA
      • 2bit
    • File format support - Saving
      • FASTA
    • Operations
      • Printing
      • Sliding window extraction
      • Reverse complement generation
  • Sequence region handling
    • File format support - Loading and saving
      • GFF
      • BED
      • Coordinate lists (chromosome:start..end)
    • Operations
      • Overlap acquisition
      • Non-overlap acquisition
      • Merged set generation
      • Exclusion set generation
      • Sequence region extraction
  • Modelling
    • Generative DNA sequence models, with training and sequence generation
      • I.i.d.
      • N'th order Markov chains
    • Confusion matrices
      • Generation from model statistics
      • Printing
      • Receiver Operating Characteristic curve generation
      • Precision Recall Curve generation
      • Area Under the Curve calculation
    • Feature models
      • Log-odds
      • Dummy
      • Support Vector Machines (via sklearn)
      • Random Forest (via sklearn)
    • Features
      • k-mer spectrum
      • Motif occurrence spectrum
      • Motif pair occurrence spectrum
  • Motifs
    • Types
      • IUPAC nucleotide motifs
      • Position Weight Matrices
      • k-mer spectra
  • Feature networks
    • Directed acyclic graphs of features
    • Transformations of feature sets: filtering; concatenation; scaling; square; ...
    • Feature network nodes for constructing models
    • Application to sequences
  • Optionally integrates with established packages
    • Numpy – for integration with external methods
    • Pandas – for integration with external methods
    • Scikit-learn – for extended analyses and classic machine learning
    • TensorFlow – for neural networks
    • Jupyter Notebooks – for interactive and reproducible analysis and modelling
  • Easy to use
  • Objects are represented by classes, with human-readable descriptions
  • Optimized with Cython
  • ...

Requirements

  • Python 3.6, 3.7, 3.8, 3.9
  • Windows, MacOS or Linux
  • C++ compiler when installing on Linux
  • Optional: Cython – required only when building from source
  • Optional: sklearn – for SVM and RF modelling
  • Optional: CuPy and CUDA – for CUDA-optimized SVM
  • Optional: TensorFlow – for neural networks

Citing

If you use Gnocis in published research, Gnocis must be cited. An article for Gnocis is in the process of being submitted for peer review. Please check back for an updated citation policy.


License

MIT License

Copyright (c) 2018- Bjørn André Bredesen-Aa

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Logo

Logo: Copyright Bjørn André Bredesen-Aa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnocis-0.9.12.tar.gz (1.6 MB view hashes)

Uploaded Source

Built Distributions

gnocis-0.9.12-cp39-cp39-win_amd64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

gnocis-0.9.12-cp39-cp39-win32.whl (1.0 MB view hashes)

Uploaded CPython 3.9 Windows x86

gnocis-0.9.12-cp39-cp39-macosx_10_14_x86_64.whl (1.6 MB view hashes)

Uploaded CPython 3.9 macOS 10.14+ x86-64

gnocis-0.9.12-cp38-cp38-win_amd64.whl (1.2 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

gnocis-0.9.12-cp38-cp38-win32.whl (1.0 MB view hashes)

Uploaded CPython 3.8 Windows x86

gnocis-0.9.12-cp38-cp38-macosx_10_14_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 macOS 10.14+ x86-64

gnocis-0.9.12-cp37-cp37m-win_amd64.whl (1.2 MB view hashes)

Uploaded CPython 3.7m Windows x86-64

gnocis-0.9.12-cp37-cp37m-win32.whl (1.0 MB view hashes)

Uploaded CPython 3.7m Windows x86

gnocis-0.9.12-cp37-cp37m-macosx_10_14_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.7m macOS 10.14+ x86-64

gnocis-0.9.12-cp36-cp36m-win_amd64.whl (1.2 MB view hashes)

Uploaded CPython 3.6m Windows x86-64

gnocis-0.9.12-cp36-cp36m-win32.whl (1.0 MB view hashes)

Uploaded CPython 3.6m Windows x86

gnocis-0.9.12-cp36-cp36m-macosx_10_14_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.6m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page