Heuristic cotranscriptional folding using the nearest neighbor energy model.
Project description
DrTransformer -- heuristic cotranscriptional folding.
DrTransformer (short for "DNA-to-RNA transformer") is a program for heuristic and deterministic cotranscriptional folding simulations of RNA molecules. The code of this project is available under MIT license, however this software depends on the ViennaRNA package which is available through the ViennaRNA license.
Installation
If you already have the Python bindings of the ViennaRNA package installed, then the latest stable release of DrTransformer can be installed from PyPI:
~$ pip install drtransformer
DrTransformer can also be installed with bioconda to resolve the ViennaRNA dependency automatically. First, make sure bioconda is set up properly with:
~$ conda config --add channels defaults
~$ conda config --add channels bioconda
~$ conda config --add channels conda-forge
~$ conda config --set channel_priority strict
Second, install or update your DrTransformer installation.
~$ conda install drtransformer
Testing/Contributing
To install the latest development version of DrTransformer, clone the repository and run:
~$ pip install .[dev]
Use the following command to run all present unittests:
~$ python -m pytest
Please provide unittests if you are submitting a pull request with a new feature.
Usage
Until further documentation is available, please use the --help options of the command line executables:
~$ DrTransformer --help
~$ DrPlotter --help
An example cotranscriptional folding simulation
We show simulations of three sequences designed by Xayaphoummine et
al. (2006). Briefly, two sequences are composed of the same palindromic
subsequences (A, B, C, D) in forward and reverse order (ABCD
and DCBA
); the
third sequence (DCMA
) has a point mutation which changes B to M. The
experiment demonstrates how the order of helix formation determines which
structures are formed at the end of transcription, an effect that cannot be
observed with a thermodynamic equilibrium prediction, because the free energies
of, for example, the helices A:B and B:A are almost the same due to their
palindromic subsequences. The three input files ABCD.fa
, DCBA.fa
and
DCMA.fa
contain a fasta header and the respective sequence from the
original publication. Those files can be found in the subfolder examples/
.
~$ cat ABCD.fa | DrTransformer --name ABCD --o-prune 0.01 --logfile
This command line call of DrTransformer produces two files:
ABCD.log
contains a human-readable summary of the cotranscriptional folding process.ABCD.drf
contains the details of the cotranscriptional folding simulation in the DrForna file format.
Structure-based data analysis
DrPlotter supports different types of visual analysis for the .drf
file
format. The following command line call reads the previously generated file
ABCD.drf
and produces a plot called ABCD.png
.
~$ cat ABCD.drf | DrPlotter --name ABCD --format png
The legend of ABCD.png
must be interpreted in combination with the ABCD.log
file. Note that the structure IDs from your newly generated files might not
match the ones shown here. For example, to see which structures are shown at
the simulation of nucleotide 73, read the log file entries for this transcript
length:
73 1 .(..(((((((((((((((....)))))))))))))))..).(((((((((.......)))))))))...... -42.60 +[0.0213 -> 0.9876] ID = 24
73 2 ....(((((((((((((((....))))))))))))))).(..(((((((((....)).)))))))..)..... -39.90 -[0.9787 -> 0.0124] ID = 25
The logfile lists two structures (in order of their free energy), it shows
their occupancy at the start of the simulation and at the end of a simulation
in square brackets, and it provides the ID to follow a specific structure
through the transcription process (+/- indicate a change in occpancy). The IDs
are used as labels in the plot ABCD.png
.
Motif-based data analysis
Instead of following specific structures, it is often more helpful to visualize
when specific helical motifs are formed in the ensemble. Generally, we refer to
a helix formed from sequences A and B as A:B, etc. All potential helices
plotted here are provided in dot-bracket notation in the files ABCD.motifs
, DCBA.motifs
and DCMA.motifs
.
~$ cat ABCD.drf | DrPlotter --name ABCD-motifs --molecule ABCD --format png --motiffile ABCD.motifs --motifs A:B C:D A:D B:C
~$ cat DCBA.drf | DrPlotter --name DCBA-motifs --molecule DCBA --format png --motiffile DCBA.motifs --motifs B:A D:C D:A C:B
~$ cat DCMA.drf | DrPlotter --name DCMA-motifs --molecule DCMA --format png --motiffile DCMA.motifs --motifs M:A D:C D:A C:M
ABCD forms only structures A:B and C:D but not A:D and B:C. Also, helix C:D is
not formed "immediately", because there is a competing structure which
is cotranscriptionally favored (see ID 25 from the previous anlysis).
DCBA forms structures with all motifs. The helical structures C:B and
D:A dominate with more than 90%, the helices D:C and B:A are
below 10% of the population. Eventually, D:C and B:A will be
dominant, but not on the time scale simulated here. (Can you repeat the analysis
to see how much time it needs until D:C and B:A dominate the ensemble?)
As shown in the publication, a single point mutation (from DCBA to DCMA) is
sufficient to drastically shift occupancy of helices: M:A and D:C
are more occupied at the end of transcription than D:A and C:M.
Tips and tricks
- The header of the logfile contains all relevant DrTransformer parameters that generated the file.
- You can use the parameter
--plot-minh
to group similar structures (separated by energy barriers < plot-minh) together. In contrast to the--t-fast
parameter, this will not affect the accuracy of the model. - Use
--pause-sites
to see the effects of pausing at specific nucleotides on cotranscriptional folding. - Motifs for DrPlotter can also contain 'x' in the dot-bracket notation for must be unpaired.
Version
v0.12 -- perparing for official release
- changed --t-lin, --t-log defaults and fixed --t-lin=1, --t-log=1
- fixed potential issues with --t-end = --t-ext
- adapted README example to publication
v0.11 -- using lonely base-pairs
- removed the --noLP default (added parameter setting)
- added profiling option for runtime optimization
- using --cg-auto default paramter
- using k0=1e5, t-ext=0.04 default parameter
- added new visulization types and fixed motif file input
- added epsilon to t-fast sanity check
v0.10 -- moved to beta status (first official release)
- changes in parameter defaults
- bugfix in linalg
- new DrPlotter simulation layout and motif plotting
- repaired code to enable plotting including pause sites
v0.9 -- standalone package (no official release)
- extraction from the [ribolands] package to a standalone Python package.
- using scipy and numpy for matrix exponentials (instead of [treekin])
- implemented lookahead to skip pruning of potentially relevant future structures
Cite
Stefan Badelt, Ronny Lorenz, Ivo L Hofacker: DrTransformer: heuristic cotranscriptional RNA folding using the nearest neighbor energy model, Bioinformatics, Volume 39, Issue 1, January 2023, https://doi.org/10.1093/bioinformatics/btad034
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file drtransformer-1.0.tar.gz
.
File metadata
- Download URL: drtransformer-1.0.tar.gz
- Upload date:
- Size: 189.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75b0363255866ece1aa80577d8c086ee94d85ee2bbec29c3ef53cc332d7a4878 |
|
MD5 | 9c684fe192c2e17b697119914424f297 |
|
BLAKE2b-256 | 3a050564c9a2b8537742064143544cc2f343de47864b861e50869e470b90ffe7 |
File details
Details for the file drtransformer-1.0-py3-none-any.whl
.
File metadata
- Download URL: drtransformer-1.0-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 899b9a897c66eaf309ec74945744b742c3e8afab0d421a2b5e090d7255931bc1 |
|
MD5 | 7d3b09b0f130777c55ea177812bf0832 |
|
BLAKE2b-256 | 710ca14c887edf8d78d1b5685730ed5bf05a5affde48f2e03ce50973a05d0cd4 |