Skip to main content

A tool for genome-wide prediction of double-stranded RNA structures

Project description

dsRNAscan

CI Tests Python Platforms License: GPL v3

dsRNAscan is a bioinformatics tool for genome-wide identification of double-stranded RNA (dsRNA) structures. It uses a sliding window approach to detect inverted repeats that can form dsRNA secondary structures, with special support for G-U wobble base pairing.

You can browse human genome results at dsrna.chpc.utah.edu

Install from PyPI

pip install dsrnascan
# Version 0.4.6+ includes standalone einverted binaries - no EMBOSS needed!

Basic Usage

# Scan a genome/sequence for dsRNA structures
dsrnascan input.fasta # This uses defaults of -w 10000 -s 150 --score 50 -c 4 (cpus)

# Process specific chromosome, using 8 cpus (-c)
dsrnascan genome.fasta --only_seq chr21 -c 8

# Use custom parameters for detecting smaller structures (like a minimum of 15bp)
dsrnascan sequence.fasta -w 5000 --min_bp 15

📋 Requirements

Platform Compatibility

  • Linux: ✅ Python 3.8+
  • macOS: ✅ Python 3.9+ (3.8 not supported)
  • Windows: ❌ Not supported (use WSL or Docker)

Dependencies (automatically installed):

  • numpy ≥1.19
  • pandas ≥1.1
  • biopython ≥1.78
  • ViennaRNA ≥2.4

Important: einverted Binary

Version 0.4.6+ Update: dsRNAscan now includes standalone einverted binaries for all major platforms with our G-U wobble patch built-in!

✅ No EMBOSS installation required!

Supported platforms:

  • Linux x86_64
  • Linux ARM64
  • macOS x86_64 (Intel)
  • macOS ARM64 (M1/M2)
  • Windows x86_64 (via WSL)

The correct binary is automatically selected for your platform during installation


**Note:** System-installed EMBOSS won't have the G-U patch. For full RNA functionality with G-U wobble pairs, compile from source:

```bash
# Compile with G-U patch (optional but recommended)
cd dsRNAscan
DSRNASCAN_COMPILE_FULL=true pip install .

Detailed Usage

Command-Line Options

dsrnascan --help

Complete Parameter Reference

Core Parameters:

  • -w: Window size for scanning (default: 10000)
  • -s/--step: Step size between windows (default: 150)
  • -t: Folding temperature in Celsius (default: 37)

Structure Requirements:

  • --min_bp: Minimum number of base pairs required (default: 25) - Recommended
  • --score: Minimum score threshold for inverted repeat (default: 75) - Deprecated, use --min_bp
  • --paired_cutoff: Minimum percentage of paired bases (default: 70)
  • --min: Minimum length of inverted repeat (default: 30)
  • --max: Maximum length of inverted repeat (default: 10000)
  • --max_span: Maximum span of inverted repeat (default: window size)

Region Selection:

  • --only_seq: Process only this specific sequence/chromosome (based on fasta header)
  • --start: Starting coordinate for scan (default: 0, 1-based)
  • --end: Ending coordinate for scan (default: 0 = end of sequence)

Strand Options:

  • --forward-only: Process forward strand only
  • --reverse-only: Process reverse strand only
  • Default: both strands are processed

Scoring Parameters:

  • --match: Match score (default: 3)
  • -x/--mismatch: Mismatch score (default: -4)
  • --gaps: Gap penalty (default: 12)

Algorithm Options:

  • --algorithm: Inverted repeat algorithm (einverted only currently, but more in future)
  • --eliminate-nested: Remove nested dsRNAs (default: True)
  • --chunk-size: Windows per chunk for DataFrame processing (default: 10000)

Output Options:

  • --output-dir: Output directory (default: dsrnascan_YYYYMMDD_HHMMSS)
  • --output_label: Label for output files (default: sequence header)
  • --clean: Clean up temporary files after processing

Performance:

  • -c/--cpus: Number of CPUs to use (default: 4)

Other Options:

  • --version: Show program version
  • -h/--help: Show help message
  • --batch: DEPRECATED - only with --legacy flag
  • --legacy: DEPRECATED - use legacy non-DataFrame approach (slower)

Output Files

dsRNAscan generates several output files in a timestamped directory:

  1. *_merged_results.txt: Tab-delimited file with all predicted dsRNAs

    Column Groups:

    • Genomic Coordinates (Columns 1-6): Chromosome, Strand, i_start, i_end, j_start, j_end
    • einverted Results (Columns 7-10): Score, RawMatch, PercMatch, Gaps
      • These come from the inverted repeat detection by einverted
    • RNAduplex Results (Columns 11-19): dG(kcal/mol), percent_paired, longest_helix, eff_i_start, eff_i_end, eff_j_start, eff_j_end, i_seq, j_seq, structure
      • These come from RNA secondary structure prediction by RNAduplex
      • The effective coordinates show the trimmed regions that form the optimal dsRNA structure
      • Sequences are reported in 5' to 3' RNA orientation (reverse complement for minus strand)
  2. *.dsRNApredictions.bp: IGV-compatible visualization file

    • Load in IGV to visualize dsRNA locations on genome

Example Workflows

# 1. Basic genome-wide scan with 16 CPUs
dsrnascan genome.fa -c 16 --output-dir results/

# 2. Scan specific genomic region (e.g., 200kb region on chr21)
dsrnascan hg38_chromosomes.fa.gz \
    --only_seq chr21 \
    --start 33455482 \
    --end 33655482 \
    -w 10000 -s 5000 \
    --score 75

# 3. Scan multiple chromosomes
dsrnascan genome.fa --only_seq chr1,chr2,chr3 -c 8

# 4. Sensitive scan for shorter dsRNAs
dsrnascan sequence.fa \
    -w 5000 -s 100 \
    --score 30 \
    --min 20 \
    --paired_cutoff 60

# 5. Process RNA-seq assembled transcripts
dsrnascan transcripts.fa \
    -w 1000 -s 50 \
    --paired_cutoff 60 \
    --min 25

# 6. Scan both strands (forward and reverse)
dsrnascan sequence.fa --both

# 7. Scan only reverse strand
dsrnascan sequence.fa --reverse

# 8. Quick test run on small region
dsrnascan test.fa -w 100 -s 50 --score 15 --min 10

Region-Specific Scanning

dsRNAscan supports scanning specific genomic regions, which is useful for:

  • Focusing on regions of interest (e.g., gene loci, QTL regions)
  • Testing parameters on small regions before genome-wide runs
  • Reducing computational time for targeted analysis
# Scan a 1MB region on chromosome 21
dsrnascan hg38.fa.gz \
    --only_seq chr21 \
    --start 30000000 \
    --end 31000000 \
    -w 10000 -s 1000

# Scan around a specific gene (e.g., 50kb upstream and downstream)
# If gene is at chr1:1000000-1050000
dsrnascan genome.fa \
    --only_seq chr1 \
    --start 950000 \
    --end 1100000

Installation Troubleshooting

Note: einverted with G-U Wobble Pairing Support

IMPORTANT: dsRNAscan requires a patched version of einverted that recognizes G-U wobble base pairs as matches. Standard EMBOSS einverted treats G-U as mismatches, which misses many RNA structures.

Option 1: Use Pre-compiled Binary (macOS ARM64 only)

The PyPI package includes a pre-compiled einverted for macOS ARM64 (Apple Silicon).

Option 2: Compile Patched einverted (Recommended for other platforms)

# The package includes the patch and compilation script
git clone https://github.com/Bass-Lab/dsRNAscan.git
cd dsRNAscan
./compile_patched_einverted.sh

This script will:

  1. Download EMBOSS 6.6.0 source code
  2. Apply the G-U wobble pairing patch (einverted.patch)
  3. Compile einverted with RNA-aware scoring
  4. Install it to dsrnascan/tools/einverted

Option 3: Manual Compilation

# Download and extract EMBOSS
wget ftp://emboss.open-bio.org/pub/EMBOSS/EMBOSS-6.6.0.tar.gz
tar -xzf EMBOSS-6.6.0.tar.gz
cd EMBOSS-6.6.0/emboss

# Apply the G-U patch (included in dsRNAscan package)
patch -p0 < /path/to/dsrnascan/einverted.patch

# Compile just einverted
gcc -O2 -o einverted einverted.c \
    -I../ajax/core -I../ajax/ajaxdb -I../ajax/acd \
    -L../ajax/core/.libs -L../ajax/ajaxdb/.libs -L../ajax/acd/.libs \
    -lajax -lajaxdb -lacd -lm -lz

# Copy to dsRNAscan tools directory
cp einverted /path/to/dsrnascan/tools/

Option 4: Use Standard EMBOSS (Not Recommended)

conda install -c bioconda emboss

Warning: Standard einverted will miss RNA structures with G-U wobble pairs, significantly reducing sensitivity for dsRNA detection.

"einverted binary not found" Error

If you get this error, einverted is not in your PATH. Solutions:

  1. Compile the patched version as shown above
  2. Set environment variable: export EINVERTED_PATH=/path/to/einverted

"ModuleNotFoundError: No module named 'ViennaRNA'"

Install ViennaRNA Python bindings:

# Via conda (recommended)
conda install -c bioconda viennarna

# Via pip
pip install ViennaRNA

Installation on HPC/Cluster

Important: Cluster EMBOSS modules have standard einverted which lacks G-U wobble support. You need to compile the patched version:

# Load Python module
module load python/3.8  # or your cluster's Python module

# Install dsRNAscan
pip install --user dsrnascan

# Clone repo to get compilation script and patch
git clone https://github.com/Bass-Lab/dsRNAscan.git ~/dsRNAscan_source

# Compile patched einverted in your home directory
cd ~/dsRNAscan_source
./compile_patched_einverted.sh

# Copy the compiled einverted to a location in your PATH or set environment variable
mkdir -p ~/bin
cp dsrnascan/tools/einverted ~/bin/
export PATH=$HOME/bin:$PATH

# Or set EINVERTED_PATH environment variable
export EINVERTED_PATH=$HOME/dsRNAscan_source/dsrnascan/tools/einverted

# Add to your ~/.bashrc or job submission script
echo 'export EINVERTED_PATH=$HOME/dsRNAscan_source/dsrnascan/tools/einverted' >> ~/.bashrc

For job submission scripts:

#!/bin/bash
#SBATCH --job-name=dsrnascan
#SBATCH --cpus-per-task=16

module load python/3.8
export EINVERTED_PATH=$HOME/bin/einverted  # Use your compiled version

dsrnascan genome.fa -c 16 --output-dir results/

Using dsRNAscan as a Python Module

While primarily designed as a standalone tool, dsRNAscan can be imported and used in Python scripts:

# Method 1: Simple usage
from dsrnascan import main
import sys

# Simulate command line arguments
sys.argv = ['dsrnascan', 'input.fasta', '-w', '1000', '--score', '30']
main()

# Method 2: Using subprocess for better control
import subprocess
result = subprocess.run(['dsrnascan', 'input.fasta', '--score', '30'], 
                       capture_output=True, text=True)

# Method 3: Parse results programmatically
import pandas as pd
import glob

# Run dsRNAscan
subprocess.run(['dsrnascan', 'input.fasta'])

# Find and read results
output_dir = sorted(glob.glob('dsrnascan_*'))[-1]
results = pd.read_csv(f"{output_dir}/*_merged_results.txt", sep='\t')

For more examples, see using_dsrnascan_as_module.py in the repository.

Citation

If you use dsRNAscan in your research, please cite: Comprehensive mapping of human dsRNAome reveals conservation, neuronal enrichment, and intermolecular interactions

https://doi.org/10.1101/2025.01.24.634786

Additional Tools

dsrna-browse - Interactive Results Viewer with RNA Editing Support

dsrna-browse is an interactive web-based viewer for dsRNAscan results, featuring:

  • Fornac RNA secondary structure visualization
  • Interactive dropdown selection of dsRNA predictions
  • Detailed structure metrics (free energy, base pairs, helix length)
  • RNA editing site annotation from BED or GFF3 files

Basic Usage

# Browse results in current directory
dsrna-browse

# Browse results in specific output directory
dsrna-browse dsrnascan_20250120_143022/

# Use custom port
dsrna-browse --port 8888

# Don't auto-open browser
dsrna-browse --no-browser

With RNA Editing Sites

# Annotate with editing sites from BED file
dsrna-browse dsrnascan_output/ --editing-file editing_sites.bed

# Annotate with editing sites from GFF3 file
dsrna-browse dsrnascan_output/ --editing-file editing_sites.gff3

The viewer supports both BED and GFF3 formats for editing sites:

  • BED format: chr, start, end, name, score (0-1000), strand
  • GFF3 format: Automatically detects editing-related features

Editing sites are visualized with green gradient coloring:

  • Dark green: High-frequency sites (≥80%)
  • Medium green: Medium-frequency sites (30-80%)
  • Light green: Low-frequency sites (<30%)

The viewer will:

  1. Process all *_merged_results.txt files in the directory
  2. Map editing sites to dsRNA structure positions (strand-aware)
  3. Start a local web server (default port 8080)
  4. Open your browser to display an interactive interface
  5. Show RNA structures with editing annotations

Press Ctrl+C to stop the server when done.

overlap_analyzer

Statistical enrichment analysis for genomic features overlapping with dsRNA predictions. See overlap_analyzer/README.md for details.

Note: overlap_analyzer is not included in the PyPI package to reduce size. Clone the repository to access it.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Support

Acknowledgments

  • EMBOSS team for the einverted tool
  • ViennaRNA team for RNA folding algorithms

Note: This tool is for research purposes. Ensure you understand the parameters for your specific use case.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsrnascan-0.4.6.tar.gz (642.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dsrnascan-0.4.6-py3-none-win_amd64.whl (635.2 kB view details)

Uploaded Python 3Windows x86-64

dsrnascan-0.4.6-py3-none-manylinux2014_x86_64.whl (325.1 kB view details)

Uploaded Python 3

dsrnascan-0.4.6-py3-none-manylinux2014_aarch64.whl (594.7 kB view details)

Uploaded Python 3

dsrnascan-0.4.6-py3-none-macosx_11_0_arm64.whl (609.6 kB view details)

Uploaded Python 3macOS 11.0+ ARM64

dsrnascan-0.4.6-py3-none-macosx_10_9_x86_64.whl (601.9 kB view details)

Uploaded Python 3macOS 10.9+ x86-64

dsrnascan-0.4.6-py3-none-any.whl (635.2 kB view details)

Uploaded Python 3

File details

Details for the file dsrnascan-0.4.6.tar.gz.

File metadata

  • Download URL: dsrnascan-0.4.6.tar.gz
  • Upload date:
  • Size: 642.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for dsrnascan-0.4.6.tar.gz
Algorithm Hash digest
SHA256 af038bd2b9d531b033e23147079a04ad4eef01a43718190910a57f683c6c3c07
MD5 a9b53175761a7a59ed44f4411a228e32
BLAKE2b-256 928cebfa0654509d62669a142a47a9f2bd86cece2b48dcd84f052b9bb3a45cff

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-win_amd64.whl.

File metadata

  • Download URL: dsrnascan-0.4.6-py3-none-win_amd64.whl
  • Upload date:
  • Size: 635.2 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for dsrnascan-0.4.6-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 f1e28b671f7e3c3a33db0821e5a1af20a88c9aea75bdceecf54e473e00acf207
MD5 fea461c08c238babe99345eb74f0ec9c
BLAKE2b-256 1e95427a29186b5c25f956e2925c36a50ae4eaf40c3b636326ebaf64a59d27bb

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dsrnascan-0.4.6-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 041095a1367d738ec1198abe00d331626ab38db8349cfd3b2a258274b4998c37
MD5 c054134be4c685175a3b16344de26f97
BLAKE2b-256 e28fa9fb4b851720962b5120b4f1f2fa67ee6a02d39dfa39d50efa3a248b9275

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dsrnascan-0.4.6-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 975ec83dc827303025ba64af117ec881923bac40c31ed9bfef8eafa746c6bae8
MD5 831c5ba7bcd459448af97f12134c6226
BLAKE2b-256 b6c76dbf500c12aad0deae7606ac8381652835b6af923bc7ed88d23c5cd5841f

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dsrnascan-0.4.6-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3d648b3e713c16a4926c1568f3af2558be0c9d37f2a93140af6bc57ab70c0332
MD5 37d5fd1281fd0431a98b96ad309ede32
BLAKE2b-256 95b6d5a43301bfc4c00887d4b61a9dc54ae0a27bab8ed3d9d9420833c092a0e2

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for dsrnascan-0.4.6-py3-none-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 62d3fd9100d0f5060a17960a1752c769ebb6733b5b9e6976e3b0203cb753b5e2
MD5 36beac0b72b16e07fd789dec0c606990
BLAKE2b-256 730d01a5d19de2ca845465b8684d5108342740af7691e43273d9832a2e840827

See more details on using hashes here.

File details

Details for the file dsrnascan-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: dsrnascan-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 635.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for dsrnascan-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 685a7921f365f913b47116722a69818e01739ea11194d7ff80619ae9ad9f4c86
MD5 6760446cac64cdf07043797e8f29aa8a
BLAKE2b-256 fe34a67cd8c8637c1c1e4a4ea4c66bf0092911bf5c46ed60456fc9e501a0e396

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page