A tool for genome-wide prediction of double-stranded RNA structures
Project description
dsRNAscan
dsRNAscan is a bioinformatics tool for genome-wide identification of double-stranded RNA (dsRNA) structures. It uses a sliding window approach to detect inverted repeats that can form dsRNA secondary structures, with special support for G-U wobble base pairing.
You can browse human genome results at dsrna.chpc.utah.edu
Install from PyPI
pip install dsrnascan
# Version 0.4.6+ includes standalone einverted binaries - no EMBOSS needed!
Basic Usage
# Scan a genome/sequence for dsRNA structures
dsrnascan input.fasta # This uses defaults of -w 10000 -s 150 --score 50 -c 4 (cpus)
# Process specific chromosome, using 8 cpus (-c)
dsrnascan genome.fasta --only_seq chr21 -c 8
# Use custom parameters for detecting smaller structures (like a minimum of 15bp)
dsrnascan sequence.fasta -w 5000 --min_bp 15
📋 Requirements
Platform Compatibility
- Linux: ✅ Python 3.8+
- macOS: ✅ Python 3.9+ (3.8 not supported)
- Windows: ❌ Not supported (use WSL or Docker)
Dependencies (automatically installed):
- numpy ≥1.19
- pandas ≥1.1
- biopython ≥1.78
- ViennaRNA ≥2.4
Important: einverted Binary
Version 0.4.6+ Update: dsRNAscan now includes standalone einverted binaries for all major platforms with our G-U wobble patch built-in!
✅ No EMBOSS installation required!
Supported platforms:
- Linux x86_64
- Linux ARM64
- macOS x86_64 (Intel)
- macOS ARM64 (M1/M2)
- Windows x86_64 (via WSL)
The correct binary is automatically selected for your platform during installation
**Note:** System-installed EMBOSS won't have the G-U patch. For full RNA functionality with G-U wobble pairs, compile from source:
```bash
# Compile with G-U patch (optional but recommended)
cd dsRNAscan
DSRNASCAN_COMPILE_FULL=true pip install .
Detailed Usage
Command-Line Options
dsrnascan --help
Complete Parameter Reference
Core Parameters:
-w: Window size for scanning (default: 10000)-s/--step: Step size between windows (default: 150)-t: Folding temperature in Celsius (default: 37)
Structure Requirements:
--min_bp: Minimum number of base pairs required (default: 25) - Recommended--score: Minimum score threshold for inverted repeat (default: 75) - Deprecated, use --min_bp--paired_cutoff: Minimum percentage of paired bases (default: 70)--min: Minimum length of inverted repeat (default: 30)--max: Maximum length of inverted repeat (default: 10000)--max_span: Maximum span of inverted repeat (default: window size)
Region Selection:
--only_seq: Process only this specific sequence/chromosome (based on fasta header)--start: Starting coordinate for scan (default: 0, 1-based)--end: Ending coordinate for scan (default: 0 = end of sequence)
Strand Options:
--forward-only: Process forward strand only--reverse-only: Process reverse strand only- Default: both strands are processed
Scoring Parameters:
--match: Match score (default: 3)-x/--mismatch: Mismatch score (default: -4)--gaps: Gap penalty (default: 12)
Algorithm Options:
--algorithm: Inverted repeat algorithm (einverted only currently, but more in future)--eliminate-nested: Remove nested dsRNAs (default: True)--chunk-size: Windows per chunk for DataFrame processing (default: 10000)
Output Options:
--output-dir: Output directory (default: dsrnascan_YYYYMMDD_HHMMSS)--output_label: Label for output files (default: sequence header)--clean: Clean up temporary files after processing
Performance:
-c/--cpus: Number of CPUs to use (default: 4)
Other Options:
--version: Show program version-h/--help: Show help message--batch: DEPRECATED - only with --legacy flag--legacy: DEPRECATED - use legacy non-DataFrame approach (slower)
Output Files
dsRNAscan generates several output files in a timestamped directory:
-
*_merged_results.txt: Tab-delimited file with all predicted dsRNAsColumn Groups:
- Genomic Coordinates (Columns 1-6):
Chromosome,Strand,i_start,i_end,j_start,j_end - einverted Results (Columns 7-10):
Score,RawMatch,PercMatch,Gaps- These come from the inverted repeat detection by einverted
- RNAduplex Results (Columns 11-19):
dG(kcal/mol),percent_paired,longest_helix,eff_i_start,eff_i_end,eff_j_start,eff_j_end,i_seq,j_seq,structure- These come from RNA secondary structure prediction by RNAduplex
- The effective coordinates show the trimmed regions that form the optimal dsRNA structure
- Sequences are reported in 5' to 3' RNA orientation (reverse complement for minus strand)
- Genomic Coordinates (Columns 1-6):
-
*.dsRNApredictions.bp: IGV-compatible visualization file- Load in IGV to visualize dsRNA locations on genome
Example Workflows
# 1. Basic genome-wide scan with 16 CPUs
dsrnascan genome.fa -c 16 --output-dir results/
# 2. Scan specific genomic region (e.g., 200kb region on chr21)
dsrnascan hg38_chromosomes.fa.gz \
--only_seq chr21 \
--start 33455482 \
--end 33655482 \
-w 10000 -s 5000 \
--score 75
# 3. Scan multiple chromosomes
dsrnascan genome.fa --only_seq chr1,chr2,chr3 -c 8
# 4. Sensitive scan for shorter dsRNAs
dsrnascan sequence.fa \
-w 5000 -s 100 \
--score 30 \
--min 20 \
--paired_cutoff 60
# 5. Process RNA-seq assembled transcripts
dsrnascan transcripts.fa \
-w 1000 -s 50 \
--paired_cutoff 60 \
--min 25
# 6. Scan both strands (forward and reverse)
dsrnascan sequence.fa --both
# 7. Scan only reverse strand
dsrnascan sequence.fa --reverse
# 8. Quick test run on small region
dsrnascan test.fa -w 100 -s 50 --score 15 --min 10
Region-Specific Scanning
dsRNAscan supports scanning specific genomic regions, which is useful for:
- Focusing on regions of interest (e.g., gene loci, QTL regions)
- Testing parameters on small regions before genome-wide runs
- Reducing computational time for targeted analysis
# Scan a 1MB region on chromosome 21
dsrnascan hg38.fa.gz \
--only_seq chr21 \
--start 30000000 \
--end 31000000 \
-w 10000 -s 1000
# Scan around a specific gene (e.g., 50kb upstream and downstream)
# If gene is at chr1:1000000-1050000
dsrnascan genome.fa \
--only_seq chr1 \
--start 950000 \
--end 1100000
Installation Troubleshooting
Note: einverted with G-U Wobble Pairing Support
IMPORTANT: dsRNAscan requires a patched version of einverted that recognizes G-U wobble base pairs as matches. Standard EMBOSS einverted treats G-U as mismatches, which misses many RNA structures.
Option 1: Use Pre-compiled Binary (macOS ARM64 only)
The PyPI package includes a pre-compiled einverted for macOS ARM64 (Apple Silicon).
Option 2: Compile Patched einverted (Recommended for other platforms)
# The package includes the patch and compilation script
git clone https://github.com/Bass-Lab/dsRNAscan.git
cd dsRNAscan
./compile_patched_einverted.sh
This script will:
- Download EMBOSS 6.6.0 source code
- Apply the G-U wobble pairing patch (
einverted.patch) - Compile einverted with RNA-aware scoring
- Install it to
dsrnascan/tools/einverted
Option 3: Manual Compilation
# Download and extract EMBOSS
wget ftp://emboss.open-bio.org/pub/EMBOSS/EMBOSS-6.6.0.tar.gz
tar -xzf EMBOSS-6.6.0.tar.gz
cd EMBOSS-6.6.0/emboss
# Apply the G-U patch (included in dsRNAscan package)
patch -p0 < /path/to/dsrnascan/einverted.patch
# Compile just einverted
gcc -O2 -o einverted einverted.c \
-I../ajax/core -I../ajax/ajaxdb -I../ajax/acd \
-L../ajax/core/.libs -L../ajax/ajaxdb/.libs -L../ajax/acd/.libs \
-lajax -lajaxdb -lacd -lm -lz
# Copy to dsRNAscan tools directory
cp einverted /path/to/dsrnascan/tools/
Option 4: Use Standard EMBOSS (Not Recommended)
conda install -c bioconda emboss
Warning: Standard einverted will miss RNA structures with G-U wobble pairs, significantly reducing sensitivity for dsRNA detection.
"einverted binary not found" Error
If you get this error, einverted is not in your PATH. Solutions:
- Compile the patched version as shown above
- Set environment variable:
export EINVERTED_PATH=/path/to/einverted
"ModuleNotFoundError: No module named 'ViennaRNA'"
Install ViennaRNA Python bindings:
# Via conda (recommended)
conda install -c bioconda viennarna
# Via pip
pip install ViennaRNA
Installation on HPC/Cluster
Important: Cluster EMBOSS modules have standard einverted which lacks G-U wobble support. You need to compile the patched version:
# Load Python module
module load python/3.8 # or your cluster's Python module
# Install dsRNAscan
pip install --user dsrnascan
# Clone repo to get compilation script and patch
git clone https://github.com/Bass-Lab/dsRNAscan.git ~/dsRNAscan_source
# Compile patched einverted in your home directory
cd ~/dsRNAscan_source
./compile_patched_einverted.sh
# Copy the compiled einverted to a location in your PATH or set environment variable
mkdir -p ~/bin
cp dsrnascan/tools/einverted ~/bin/
export PATH=$HOME/bin:$PATH
# Or set EINVERTED_PATH environment variable
export EINVERTED_PATH=$HOME/dsRNAscan_source/dsrnascan/tools/einverted
# Add to your ~/.bashrc or job submission script
echo 'export EINVERTED_PATH=$HOME/dsRNAscan_source/dsrnascan/tools/einverted' >> ~/.bashrc
For job submission scripts:
#!/bin/bash
#SBATCH --job-name=dsrnascan
#SBATCH --cpus-per-task=16
module load python/3.8
export EINVERTED_PATH=$HOME/bin/einverted # Use your compiled version
dsrnascan genome.fa -c 16 --output-dir results/
Using dsRNAscan as a Python Module
While primarily designed as a standalone tool, dsRNAscan can be imported and used in Python scripts:
# Method 1: Simple usage
from dsrnascan import main
import sys
# Simulate command line arguments
sys.argv = ['dsrnascan', 'input.fasta', '-w', '1000', '--score', '30']
main()
# Method 2: Using subprocess for better control
import subprocess
result = subprocess.run(['dsrnascan', 'input.fasta', '--score', '30'],
capture_output=True, text=True)
# Method 3: Parse results programmatically
import pandas as pd
import glob
# Run dsRNAscan
subprocess.run(['dsrnascan', 'input.fasta'])
# Find and read results
output_dir = sorted(glob.glob('dsrnascan_*'))[-1]
results = pd.read_csv(f"{output_dir}/*_merged_results.txt", sep='\t')
For more examples, see using_dsrnascan_as_module.py in the repository.
Citation
If you use dsRNAscan in your research, please cite: Comprehensive mapping of human dsRNAome reveals conservation, neuronal enrichment, and intermolecular interactions
https://doi.org/10.1101/2025.01.24.634786
Additional Tools
dsrna-browse - Interactive Results Viewer with RNA Editing Support
dsrna-browse is an interactive web-based viewer for dsRNAscan results, featuring:
- Fornac RNA secondary structure visualization
- Interactive dropdown selection of dsRNA predictions
- Detailed structure metrics (free energy, base pairs, helix length)
- RNA editing site annotation from BED or GFF3 files
Basic Usage
# Browse results in current directory
dsrna-browse
# Browse results in specific output directory
dsrna-browse dsrnascan_20250120_143022/
# Use custom port
dsrna-browse --port 8888
# Don't auto-open browser
dsrna-browse --no-browser
With RNA Editing Sites
# Annotate with editing sites from BED file
dsrna-browse dsrnascan_output/ --editing-file editing_sites.bed
# Annotate with editing sites from GFF3 file
dsrna-browse dsrnascan_output/ --editing-file editing_sites.gff3
The viewer supports both BED and GFF3 formats for editing sites:
- BED format: chr, start, end, name, score (0-1000), strand
- GFF3 format: Automatically detects editing-related features
Editing sites are visualized with green gradient coloring:
- Dark green: High-frequency sites (≥80%)
- Medium green: Medium-frequency sites (30-80%)
- Light green: Low-frequency sites (<30%)
The viewer will:
- Process all
*_merged_results.txtfiles in the directory - Map editing sites to dsRNA structure positions (strand-aware)
- Start a local web server (default port 8080)
- Open your browser to display an interactive interface
- Show RNA structures with editing annotations
Press Ctrl+C to stop the server when done.
overlap_analyzer
Statistical enrichment analysis for genomic features overlapping with dsRNA predictions. See overlap_analyzer/README.md for details.
Note: overlap_analyzer is not included in the PyPI package to reduce size. Clone the repository to access it.
License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Support
- Issues: GitHub Issues
- Documentation: GitHub Wiki
Acknowledgments
- EMBOSS team for the einverted tool
- ViennaRNA team for RNA folding algorithms
Note: This tool is for research purposes. Ensure you understand the parameters for your specific use case.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsrnascan-0.4.6.tar.gz.
File metadata
- Download URL: dsrnascan-0.4.6.tar.gz
- Upload date:
- Size: 642.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af038bd2b9d531b033e23147079a04ad4eef01a43718190910a57f683c6c3c07
|
|
| MD5 |
a9b53175761a7a59ed44f4411a228e32
|
|
| BLAKE2b-256 |
928cebfa0654509d62669a142a47a9f2bd86cece2b48dcd84f052b9bb3a45cff
|
File details
Details for the file dsrnascan-0.4.6-py3-none-win_amd64.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-win_amd64.whl
- Upload date:
- Size: 635.2 kB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1e28b671f7e3c3a33db0821e5a1af20a88c9aea75bdceecf54e473e00acf207
|
|
| MD5 |
fea461c08c238babe99345eb74f0ec9c
|
|
| BLAKE2b-256 |
1e95427a29186b5c25f956e2925c36a50ae4eaf40c3b636326ebaf64a59d27bb
|
File details
Details for the file dsrnascan-0.4.6-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 325.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
041095a1367d738ec1198abe00d331626ab38db8349cfd3b2a258274b4998c37
|
|
| MD5 |
c054134be4c685175a3b16344de26f97
|
|
| BLAKE2b-256 |
e28fa9fb4b851720962b5120b4f1f2fa67ee6a02d39dfa39d50efa3a248b9275
|
File details
Details for the file dsrnascan-0.4.6-py3-none-manylinux2014_aarch64.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-manylinux2014_aarch64.whl
- Upload date:
- Size: 594.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
975ec83dc827303025ba64af117ec881923bac40c31ed9bfef8eafa746c6bae8
|
|
| MD5 |
831c5ba7bcd459448af97f12134c6226
|
|
| BLAKE2b-256 |
b6c76dbf500c12aad0deae7606ac8381652835b6af923bc7ed88d23c5cd5841f
|
File details
Details for the file dsrnascan-0.4.6-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 609.6 kB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d648b3e713c16a4926c1568f3af2558be0c9d37f2a93140af6bc57ab70c0332
|
|
| MD5 |
37d5fd1281fd0431a98b96ad309ede32
|
|
| BLAKE2b-256 |
95b6d5a43301bfc4c00887d4b61a9dc54ae0a27bab8ed3d9d9420833c092a0e2
|
File details
Details for the file dsrnascan-0.4.6-py3-none-macosx_10_9_x86_64.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-macosx_10_9_x86_64.whl
- Upload date:
- Size: 601.9 kB
- Tags: Python 3, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62d3fd9100d0f5060a17960a1752c769ebb6733b5b9e6976e3b0203cb753b5e2
|
|
| MD5 |
36beac0b72b16e07fd789dec0c606990
|
|
| BLAKE2b-256 |
730d01a5d19de2ca845465b8684d5108342740af7691e43273d9832a2e840827
|
File details
Details for the file dsrnascan-0.4.6-py3-none-any.whl.
File metadata
- Download URL: dsrnascan-0.4.6-py3-none-any.whl
- Upload date:
- Size: 635.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
685a7921f365f913b47116722a69818e01739ea11194d7ff80619ae9ad9f4c86
|
|
| MD5 |
6760446cac64cdf07043797e8f29aa8a
|
|
| BLAKE2b-256 |
fe34a67cd8c8637c1c1e4a4ea4c66bf0092911bf5c46ed60456fc9e501a0e396
|