Simple Python MotEvo wrapper.

These details have not been verified by PyPI

Project links

Project description

motevowrapper

Simple Python parser for MotEvo files.

To install, run:

pip install motevowrapper

MotEvo

MotEvo (Arnold et al. 2012) is a Bayesian probabilistic model for prediction of transcription factor binding sites (TFBSs) for a given set of position weight matrices (PWMs) and DNA sequences. It was developed by van Nimwegen lab at the Biozentrum (University of Basel, Switzerland) and it can be acquired here.

This repository contains the source code for a simple Python package that allows you to:

Run MotEvo with given parameters
Parse MotEvo output files
Visualize visualize site density per motif

Installing MotEvo

MotEvo source code can be downloaded from the SwissRegulon website. You can either download the source and compile it, or download binaries for MacOS or Linux. Don't forget to add path to executables to your .bashrc or .bash_profile. You can do it by simply running

export PATH=$PATH:/path/to/motevo/bin

Running MotEvo from `motevowrapper`

Method for running MotEvo is run_motevo(...). Parameter description can be found in the MotEvo source code. I copied it here for better visibility.

sites_file, priors_file = mw.run_motevo(
    # Command line parameters
    sequences_file,             # Sequences or alignments file
    wm_path,                    # Path to the position weight matrix (PWM) of a given motif
    working_directory="./",     # Working directory

    # General
    Mode="TFBS",                # (word) Mode of running. This can be TFBS (TFBS predictions; default), ENH (enhancer finding), or WMREF (weight matrix refinement)
    refspecies,                 # (word) The identifier of the reference species (as found in the sequence identifier and in the phylogenetic tree).
    TREE,                       # (tree string) Phylogenetic tree in Newick format.
    restrictparses,             # (binary) When 1 only use sites that have a reference weight matrix score bigger than 0. Default: 0. Only used for testing.
    singlestrand,               # (binary) When 1 predict sites only on the positive strand.

    # Priors
    bgprior,                    # (real number) Prior probability for putting down a background at each position.
    EMprior=0,                  # (binary) Use the expectation maximization algorithm to find the priors that maximize the probability of the observed alignment.
    priordiff,                  # (real value) Convergence criterion for prior estimation, e.g. at 0.01 iteration stops when priors change by less than 1%.
    UFEwmprior,                 # (real number) The prior weight of the UFE model relative to the other weight matrices.

    # Background model
    markovorderBG,              # (integer) Markov order of the background model.
    bgA=0.25,                   # (real number) background probability for A (for the zeroth order model)
    bgC=0.25,                   # (real number) background probability for C (for the zeroth order model)
    bgG=0.25,                   # (real number) background probability for G (for the zeroth order model)
    bgT=0.25,                   # (real number) background probability for T (for the zeroth order model)
    mybgfile,                   # (file name) Input file containing a higher order background model.

    # UFE
    UFEwmfile,                  # (file name) Input file containing the UFE model (run 'runUFE' to create it for a given tree and background model.)
    UFEwmlen,                   # (integer) The length of UFE model sites.
    UFEprint,                   # (binary) When set to zero UFE sites are not reported in the site file.
    UFEwmproffile,              # (file name) Output file containing UFE model probabilities at each position.

    # TFBS output
    sitefile,                   # (file name) Output file name of the file containing the predicted sites.
    priorfile,                  # (file name) Output file containing information like site density, final priors, and the total number of sites for each WM.
    loglikfile,                 # (file name) Output file containing log-likelihood of each sequence (or alignment) in the input data.
    minposterior=0.1,           # (real number) When printing sites, only print sites with a posterior bigger than this cut-off.
    printsiteals=1,             # (binary) When set to zero sequence alignments are not printed in the output file.

    # WM refinement
    minposteriorWM,             # (real number) When doing weight matrix refinement, only include sites to refine that have a minimal posterior bigger than this cut-off.
    wmdiff,                     # (real number) Convergence criterium for WM refinement, e.g. at 0.01 iteration stops when WM entries change by less than 1%

    # Enhancer prediction
    CRMfile,                    # (file name) Output file containing the results when running MotEvo in the enhancer prediction mode.
    winlen,                     # (integer) Length of the enhancer window used in enhancer prediction mode.
    steplen,                    # (integer) Number of positions by which the window is moved at each step during enhancer prediction.

    # Additional parameters
    try_until_succeeding=False, # Run MotEvo until there `sites` and `priors` files are created
    verbose=False,              # Print more details during MotEvo run
)

You can note two parameters were added, try_until_succeeding and verbose. These were added for the needs of this Python wrapper.

Parameters that have default value set, will be used for sure, including:

TREE which is set to species tree in case phylogenetic tree is not provided.
UFEwmlen which is set the length of PWM in use in case "auto" is passed to this parameter.

For example, in order to use it you can use the following example:

import motevowrapper.motevowrapper as mw

sites_file, priors_file = run_motevo(
    sequences_file="zebrafish_promoters.fa",
    working_directory="./",
    wm_path="REST.wm",
    tree="(danRer11: 1.0);",
    ref_species="danRer11",
    background_prior=0.8,
)

For more information on how to use all of MotEvo's options, please check out MotEvo source code and MotEvo paper.

Parsing MotEvo files from MotevoWrapper

MotEvo produces 2 files: sites and priors file. Usage of the package is simple. For a given MotEvo sites file stored at /path/to/sites_MOTIF.wm by calling:

import motevowrapper.motevowrapper as mw

df_sites = mw.parse_sites('/path/to/sites_file') # Motif binding sites
df_priors = mw.parse_sites('/path/to/priors_file') # Final file with priors

you get a Pandas data frame containing parsed data from the MotEvo run. Further manipulation with the dataframe allows getting motif binding density on all sequences, number of binding sites, number of different species from alignment used, etc.

Visualizing site density per motif using MotevoWrapper

df = mw.parse_sites("sites_REST.wm")
mw.plot_site_distribution("REST", df)

References

Arnold, Phil, et al. "MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences." Bioinformatics 28.4 (2012): 487-494.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.6

Mar 17, 2021

0.0.5

Mar 12, 2021

0.0.4

Mar 9, 2021

0.0.3

Feb 8, 2021

0.0.2

Feb 8, 2021

0.0.1

Feb 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

motevowrapper-0.0.6.tar.gz (10.5 kB view details)

Uploaded Mar 17, 2021 Source

Built Distribution

motevowrapper-0.0.6-py3-none-any.whl (8.8 kB view details)

Uploaded Mar 17, 2021 Python 3

File details

Details for the file motevowrapper-0.0.6.tar.gz.

File metadata

Download URL: motevowrapper-0.0.6.tar.gz
Upload date: Mar 17, 2021
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for motevowrapper-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`85be4fbf072b4bc4b8b1748675f373c94f591e59be4ee31b73aadeb834c436fb`
MD5	`6ec371ba75d2fed2aff664f908d6daf6`
BLAKE2b-256	`e89f82c84a2c9297b97593ea9ad75e2a28267ae7159deb23988322b9c42a28da`

See more details on using hashes here.

File details

Details for the file motevowrapper-0.0.6-py3-none-any.whl.

File metadata

Download URL: motevowrapper-0.0.6-py3-none-any.whl
Upload date: Mar 17, 2021
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for motevowrapper-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f75711aea46690d7dc2a3080001e0814244f73b7424636365f121baaf14ab59`
MD5	`9365c84a7206b1e1379bc69203bebea4`
BLAKE2b-256	`0a8284f257ec1a5e145cbc0944d08ae4b9b89b2c5ead6b4316c2f8e0a6d211cd`

See more details on using hashes here.

motevowrapper 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

motevowrapper

MotEvo

Installing MotEvo

Running MotEvo from `motevowrapper`

Parsing MotEvo files from MotevoWrapper

Visualizing site density per motif using MotevoWrapper

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

motevowrapper 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

motevowrapper

MotEvo

Installing MotEvo

Running MotEvo from motevowrapper

Parsing MotEvo files from MotevoWrapper

Visualizing site density per motif using MotevoWrapper

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Running MotEvo from `motevowrapper`