integrative probing analysis of nucleic acids empowered by multiple accessibility profiles
Project description
IPANEMAP
Integrative Probing Analysis of Nucleic Acids Empowered by Multiple Accessibility Profiles.
IPANEMAP is a software for predicting stable RNA secondary structures compatible from multiple chemical probing (SHAPE, DMS, Enzymatic...) reactivities profiles. From one or several input sequences, along with several reactivity profiles, it computes and outputs one or several secondary structures, corresponding to the conformers best supported by experimental data and thermodynamics.
Installing IPANEMAP
IPANEMAP consists in a set of Python 2.7+ scripts, and requires the prior installation, and accessibility from the command line, of the following dependencies:
ViennaRNA
package, version posterior to 2.0, downloadable from the TBIscikit-learn
, version 0.2 for Python 2.7+scipy
andnumpy
.
On a standard python
installation, all dependencies except for the ViennaRNA
package can be solved using pip
:
pip install cython scipy numpy sklearn
Executing IPANEMAP
Once all dependencies are satisfied, IPANEMAP can be invoked through:
python2.7 IPANEMAP.py [--RNA rnafile.fa] [--cond c1 c2 ...]
The method will run with a configuration specified within IPANEMAP.cfg
, optionally overriding the RNA using the --RNA
command-line option, and the list of conditions with the --cond
option (see below for more details).
Input files
Reactivity/soft constraints file format
IPANEMAP expects to find reactivities for a condition {Cond}
in a file {SoftConstraintsDir}/{RNA}{Cond}.txt
, where {RNA}
is the name of the chosen RNA (ie the name of the input FASTA file, minus its extension), and {SoftConstraintsDir}
is the general folder where reactivities are located.
The content of a reactivity file is simply a list of position/value pairs providing a reactivity for each position. Values are expected to be loosely normalized, and fall in the [0,1] interval (except for a few outliers), with negative numbers mainly indicating missing values.
Example:
1 0.568309
2 0.179692
3 -999
4 0.568389
....
Hard constraints file format
Hard constraints allow to force predictions to be consistent with prior partial knowledge. They should be expressed in a file {HardConstraintsDir}/{RNA}{Cond}.txt
in classic FASTA/DBN format (see example below), consisting of sequence/constraint mask in extended dot-bracket notation supported by the Vienna package syntax.
Example: The following file content
> Some RNA
CCCAAAUGGG
(x(....)x)
indicates that two base pairs, corresponding to matching parentheses (
and )
, should always be respected by the models.
Positions associated with x
will be forced to remain unpaired, but positions associated with a dot .
are not constrained in the folding.
More complex constraints are available, as described in the Vienna package documentation.
Outputs
Basic output
IPANEMAP typically produces many messages during execution, to keep the user informed of its progress. However, only the final (Pareto) structural models are output to the standard output, meaning that, after running
python2.7 IPANEMAP.py > output.dat
the output.dat
file will only consist of the final models.
Example: For an input sequence GGGAAACCCAAAGGGAAACCC
, and probing profile assigning high accessibilities to A
s, running the above command will lead to the production of a file output.dat
, having content
Structure dG #SupportingConditions BoltzmannProbability
(((...)))...(((...))) -4.3 1 0.5735037115553154
where each line represents a cluster, and consists of:
- Secondary structure model (centroid of the cluster)
- Free-energy, as recomputed using
RNAeval
; - Number of supporting conditions;
- Accumulated Boltzmann probability across conditions (aka stability in the companion manuscript), as computed using
RNAeval
.
In this example, a unique probing condition implies a single model, but multiple structures may be produced in a multi-probing setting.
Configuration
Most configuration options are set by modifying the content of the IPANEMAP.cfg
file.
Main options
RNA
: Specifies a path (relative to the working directory) to a FASTA file where the nucleotide sequence of the main RNA of interest can be found. Note that the filename is important, as it will be used as a base name for the other input files. Example:RNA: fasta_files/didymium.fa
will process the sequence found in the file, anddidymium
will be used as the base name of reactivities/hard contraints files (seeConditions
option)SoftConstraintsDir
andHardConstraintsDir
: Sets the directories used by IPANEMAP to locate soft (reactivities) and hard constraints files (if available)Conditions
: Can be used to specify the list of probing conditions used for the prediction. Should be set to a comma-separated list of conditions, i.e. the names of reactivity profiles/experiments to be considered for structure prediction
For an RNA having base name {RNA}
, and a condition name {Cond}
, IPANEMAP will attempt to locate files named {SoftConstraintsDir}/{RNA}{Cond}.txt
and {HardConstraintsDir}/{RNA}{Cond}.txt
. If none of these files is found, the method will rely on a purely thermodynamic sampling.
Example: Given a configuration
[Input]
RNA: fasta_files/5sRNA.fa
SoftConstraintsDir: soft
HardConstraintsDir: hard
Conditions: DMSMG,NMIA
...
the method will attempt to locate, and use for the sampling phase of the method, two files 5sRNADMSMG.txt
and 5sRNANMIA.txt
in each of the soft
and hard
directories.
Sampling options
DoSampling
: If set totrue
, IPANEMAP will always re-generate a representative structural sample (even if one can already be found)NumStructures
: Number of structures per condition, generated to approximate the pseudo-Boltzmann ensembleTemperature
: Temperature (in Celsius) used for the samplingm
andb
: Slope and intercept used in the reactivity to pseudo-energy conversion (see Deigan et al, PNAS 2009)
Misc options
WorkingDir
: Main output directory for temp files, and final results of the analysis. Directory will be created if non-existent.LogFile
: Name of file gathering the accumulated log. File will be created if non-existent.
Visualization options
IPANEMAP currently relies on VARNA to produce
DrawModels
: If set totrue
, uses VARNA to draw the final, Pareto-optimal, secondary structure models.DrawCentroids
: If set totrue
, uses VARNA to draw the centroids associated with all of the clusters.ShowProbing
: If set totrue
, uses the reactivities of the first probing condition (as specified to thecond
option, orConditions
section of the config file) to annotate the secondary structure drawings.
How to...
- How do I perform a pure thermodynamic/constraints-free prediction?
Simply make sure that no constraint file named{RNA}{Cond}.txt
is found in either{SoftConstraintsDir}
or{HardConstraintsDir}
, and IPANEMAP will default to a purely thermodynamic sampling (you may safely ignore the warning).
Example: Executing the commandpython2.7 IPANEMAP.py --RNA rna.fa --cond thermo
with no file namedrnathermo.txt
in either of the constraints directories will run a pure thermodynamic prediction. - How do I specify a different sequence for some specific condition?
This need arises when minor variants of the original sequence have been probed (eg Mutate-and-Map protocols), and must be used for the sampling.- When available, hard constraint files already specify a sequence, which is used instead of the main FASTA file for the sampling.
Example: For an RNA filemyRNA.fa
and a condition name ofSHAPE
, the sequence found in a{HardConstraintsDir}/myRNASHAPE.txt
file, will be used for the sampling instead of the one found inmyRNA.fa
. - For reactivity/SHAPE data files, if a FASTA file named
{RNA}{Cond}.fa
is found in either of the condition directories, then its sequence will be used instead of the main FASTA file.
Example: For an RNArib.fa
and a condition name1M7
, the sequence found at{SoftConstraintsDir}/rib1M7.fa
will be used for the sampling instead of the one found inrib.fa
.
- When available, hard constraint files already specify a sequence, which is used instead of the main FASTA file for the sampling.
Citation
Please cite: A. Saaidi, D. Allouche, M. Regnier, B. Sargueil, Y.Ponty. IPANEMAP: Integrative Probing Analysis of Nucleic Acids Empowered by Multiple Accessibility Profiles, NAR(2020), link
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.