Multi-aligner consensus framework for nanopore RNA-seq splice junction detection

These details have not been verified by PyPI

Project links

Project description

Nanopore COMPASS

COMPASS (Comparison Of Multiple alignment Programs for Alternative Splice site discovery) adapted for Oxford Nanopore long-read RNA sequencing data.

Overview

Nanopore COMPASS is a multi-aligner consensus framework for detecting splice junctions and alternative splice sites (ASS) in nanopore direct RNA sequencing data. It addresses nanopore-specific challenges including high error rates (~5-15%), homopolymer errors, and alignment artifacts.

Key Features

4-aligner consensus strategy for robust junction detection
Nanopore error-tolerant scoring (reduced penalties for indels, homopolymer errors)
Quality-based filtering using perfect flanking sequence matches
Ambiguous junction resolution for homopolymer regions
RPG (ribosomal protein gene) analysis comparing highly-expressed genes vs genome-wide
Organism-specific configuration (yeast, human, extensible)

Original COMPASS

This implementation builds on the original COMPASS framework for short-read data:

Paper: Roy et al., RNA (2023), PMID: 37956322
GitHub: https://github.com/k-roy/COMPASS

Multi-Aligner Strategy

Nanopore COMPASS uses 4 complementary aligners with orthogonal algorithms:

Aligner	Algorithm Type	Primary Strength	Speed
Minimap2	Seed-and-chain	High sensitivity, general-purpose	Fast (9-40x vs GraphMap2)
uLTRA	Graph + collinear chaining	Small exons (11-20nt), annotation-guided	Medium
deSALT	De Bruijn graph	Complex isoforms, alternative splicing	Medium-slow
GraphMap2	3-phase mapping	High accuracy validation	Slow

Consensus Selection Hierarchy:

Lowest error-weighted alignment score
Annotation match (if tie)
Most junctions detected per read
Aligner precedence: GraphMap2 > uLTRA > deSALT > Minimap2
Random (if still tied)

Installation

Dependencies

Python (≥3.8):

pysam
pandas
numpy
pyyaml
matplotlib
seaborn

Aligners:

minimap2 - Fast splice-aware aligner
uLTRA - Annotation-guided alignment
deSALT - De Bruijn graph aligner
GraphMap2 - High-accuracy 3-phase mapping

R packages (optional, for plots):

ggplot2
UpSetR
ComplexHeatmap

Setup

# Clone or navigate to repository
cd /oak/stanford/groups/larsms/Users/kevinroy/software/nanopore-compass

# Add to Python path
export PYTHONPATH=$PYTHONPATH:/oak/stanford/groups/larsms/Users/kevinroy/software/nanopore-compass/src

# Install Python dependencies (conda recommended)
conda create -n nanopore-compass python=3.10 pysam pandas pyyaml matplotlib seaborn
conda activate nanopore-compass

# Install aligners (via conda or from source)
conda install -c bioconda minimap2 graphmap2
# uLTRA and deSALT: see respective GitHub repositories for installation

Quick Start

1. Prepare Configuration

Edit config/yeast.yaml to specify:

Reference genome FASTA path
Annotation GFF path
Introns file path (COMPASS format)
Aligner paths and parameters
Filtering thresholds

2. Run Alignments

# Option A: Run all aligners sequentially
./bin/align_and_process.sh \
  --config config/yeast.yaml \
  --reads sample.fastq \
  --output output_dir \
  --sample sample_name

# Option B: Run individual aligners
minimap2 -ax splice -uf -k14 -G 5000 ref.fa reads.fq > minimap2.sam
ultra -i annotations.gtf --ont ref.fa reads.fq -o ultra.sam
deSALT aln -r ref.fa -a annotations.gtf -k 15 reads.fq > desalt.sam
graphmap2 align -r ref.fa -d reads.fq -o graphmap2.sam --extcigar

3. Extract and Merge Junctions

from pathlib import Path
from compass_functions import JunctionExtractor
from junction_comparison import aggregate_multi_aligner_junctions
from utils import load_intron_annotations

# Extract junctions from each aligner
aligners = ['minimap2', 'ultra', 'desalt', 'graphmap2']
junction_tables = {}

for aligner in aligners:
    bam_path = Path(f"output_dir/{aligner}.sorted.bam")
    fasta_path = Path("ref.fa")

    extractor = JunctionExtractor(bam_path, fasta_path)
    junctions = extractor.extract_all_junctions()
    junction_tables[aligner] = junctions
    extractor.close()

# Aggregate across aligners
aggregated = aggregate_multi_aligner_junctions(junction_tables)

# Load annotations
annotated_introns = load_intron_annotations(Path("introns.tsv"))

# Filter for high confidence
from junction_comparison import filter_high_confidence_junctions

high_confidence = filter_high_confidence_junctions(
    aggregated,
    min_reads=5,
    min_aligners=2,
    min_perfect_matches=5
)

print(f"Total junctions: {len(aggregated)}")
print(f"High-confidence: {len(high_confidence)}")

4. Analyze Results

from junction_comparison import compare_aligner_performance

# Compare aligner performance
stats = compare_aligner_performance(junction_tables, annotated_introns)

for aligner, metrics in stats.items():
    print(f"\n{aligner}:")
    print(f"  Total junctions: {metrics['total_junctions']}")
    print(f"  Exact annotated: {metrics['exact_annotated_pct']:.1f}%")
    print(f"  Novel: {metrics['novel_pct']:.1f}%")

Filtering Strategy

Multi-Tier Filtering for Alternative Splice Sites

Tier 1: Minimum Read Support

Threshold: ≥5 reads per junction
Eliminates random alignment errors

Tier 2: Multi-Sample Consistency

Threshold: Junction in ≥2 biological replicates
Eliminates sample-specific artifacts

Tier 3: Flanking Sequence Quality (PRIMARY FILTER)

Threshold: ≥5-6bp perfect matches on BOTH sides of junction
Distinguishes true junctions from alignment artifacts
Key insight: True ASS even 2bp from annotation should have clean flanks

Tier 4: Splice Site Motif Scoring (for classification, not exclusion)

Score motifs but don't exclude based on non-canonical
Non-canonical 3'SS (non-YAG) can be genuine (Meyer et al. 2023)

Tier 5: Multi-Aligner Consensus

High confidence: ≥3 aligners OR ≥2 aligners including GraphMap2
Medium confidence: ≥2 aligners with orthogonal algorithms
Low confidence: Single aligner only

Recommended Thresholds

Conservative (default):

Min reads: 5
Min samples: 2
Perfect matches: 5bp both sides
Min aligners: 2

Publication-quality (stringent):

Min reads: 10
Min samples: 3
Perfect matches: 6bp both sides
Min aligners: 2 (with GraphMap2 validation)

Project Structure

nanopore-compass/
├── bin/                          # Executable scripts
│   ├── nanopore-compass          # Main pipeline
│   ├── align_and_process.sh      # Multi-aligner wrapper
│   └── combine_junctions.py      # Merge sample junctions
│
├── src/                          # Core modules
│   ├── utils.py                  # Shared utilities
│   ├── sequence_analysis.py      # Splice site scoring, motifs
│   ├── compass_functions.py      # Junction extraction, ambiguity resolution
│   └── junction_comparison.py    # Multi-aligner consensus
│
├── scripts/                      # Analysis scripts
│   ├── rpg_analysis.py           # RPG vs non-RPG comparison
│   ├── ass_detection.py          # Alternative splice site detection
│   ├── quality_control.R         # QC plots
│   └── aligner_comparison.R      # Multi-aligner visualization
│
├── config/                       # Configuration files
│   ├── yeast.yaml                # S. cerevisiae parameters
│   └── human.yaml                # H. sapiens parameters
│
├── reference/                    # Reference data
│   ├── S_cerevisiae_all_introns.tsv
│   └── S_cerevisiae_rpg_genes.txt
│
└── test/                         # Test data and scripts
    └── test_data/

Configuration

Configuration files use YAML format with organism-specific parameters:

Key Sections:

reference files: Genome FASTA, annotations, intron catalog
junction_parameters: Intron size constraints, quality thresholds
splice_sites: Organism-specific consensus sequences
aligners: Paths, parameters, and settings for each aligner
filtering: Multi-tier thresholds for junction validation
error_weights: Nanopore-specific error tolerance

See config/yeast.yaml for complete example.

Advanced Usage

RPG (Ribosomal Protein Gene) Analysis

Compare splicing characteristics between highly-expressed RPG genes and genome-wide:

from scripts.rpg_analysis import compare_rpg_vs_nonrpg

results = compare_rpg_vs_nonrpg(
    junctions_file="high_confidence_junctions.tsv",
    rpg_genes_file="reference/S_cerevisiae_rpg_genes.txt"
)

# Results include:
# - Fidelity comparison (exact annotated %)
# - Alternative splicing frequency
# - Splice site motif strength
# - Condition-specific effects (WT vs mutant)

Alternative Splice Site Detection

from scripts.ass_detection import detect_alternative_splice_sites

ass_events = detect_alternative_splice_sites(
    junctions_file="high_confidence_junctions.tsv",
    annotated_introns="reference/S_cerevisiae_all_introns.tsv",
    min_distance=10  # bp from annotated sites
)

# ASS types:
# - alt_5ss: Alternative 5' splice site
# - alt_3ss: Alternative 3' splice site
# - cryptic: Cryptic introns (within exons)
# - exon_skip: Exon skipping events

Ambiguous Junction Resolution

from compass_functions import resolve_ambiguous_junctions
from sequence_analysis import SpliceMotifScorer

# Initialize motif scorer
scorer = SpliceMotifScorer(
    five_ss_consensus="GTATGT",
    five_ss_penalties=[4, 3, 1, 1, 2, 1],
    three_ss_consensus="YYYAG",
    three_ss_penalties=[1, 1, 1, 3, 4]
)

# Resolve ambiguous junctions
resolved = resolve_ambiguous_junctions(
    junction_dict=raw_junctions,
    fasta_path=Path("ref.fa"),
    motif_scorer=scorer,
    max_shift=10
)

Nanopore-Specific Adaptations

Error-Weighted Scoring

# Nanopore error model
score = (
    mismatches * 1.0 +                # Full weight
    indels * 0.5 +                    # Reduced (common in nanopore)
    indels_in_homopolymer * 0.2 +     # Minimal (expected errors)
    soft_clips * 0.3                  # Moderate penalty
)

Quality-Based Filtering

Instead of distance-based filtering, use perfect flanking sequence matches:

# Extract from BAM alignment
upstream_matches, downstream_matches = count_perfect_matches_at_junction(read, junction_pos)

# Filter criteria
if upstream_matches >= 5 and downstream_matches >= 5:
    # High-quality junction (true biological event)
    pass
else:
    # Likely alignment artifact
    reject()

This approach correctly identifies true ASS even when only 2-3bp from annotated sites.

Parallel Processing for Large Datasets

For datasets with 100M+ reads, use the parallelization pipeline for ~25x speedup.

Design Principle

Pre-alignment FASTQ chunking is required for COMPASS 4-way consensus:

Same read can map to different genomic locations in different aligners
COMPASS compares the SAME read across all 4 aligners
Pre-alignment chunking guarantees same read order in all 4 output BAMs

Workflow

BAM → FASTQ → chunk (25x) → align with 4 aligners → 4-way consensus → merge

Usage

# Run complete parallel pipeline
bash scripts/run_parallel_pipeline.sh \
    --bam input.bam \
    --sample sample_name \
    --chunks 25 \
    --output parallel_output/

# This will:
# 1. Convert BAM to FASTQ (bedtools bamtofastq)
# 2. Split FASTQ into 25 chunks
# 3. Submit SLURM array job to align each chunk with all 4 aligners
# 4. Submit SLURM array job to process each chunk with 4-way consensus
# 5. Merge all chunks into final junction table

Performance

Single aligner (100M reads):

Serial: 250 min (4.2 hours)
Parallel (25 cores): 10 minutes
Speedup: ~25x

4 aligners (100M reads):

Serial: 1000 min (16.7 hours)
Parallel (25 cores): 40-50 minutes
Speedup: ~20-25x

Manual Workflow (Step-by-Step)

For more control, run steps individually:

# 1. Convert BAM to FASTQ
bedtools bamtofastq -i input.bam -fq input.fastq

# 2. Split FASTQ
python scripts/split_fastq.py \
    --fastq input.fastq \
    --chunks 25 \
    --output fastq_chunks/

# 3. Align each chunk (SLURM array job)
sbatch --array=1-25 slurm/align_chunk_4way.sh

# 4. Process with consensus (SLURM array job)
sbatch --array=1-25 slurm/process_chunk_consensus.sh

# 5. Merge results
python scripts/merge_junction_chunks.py \
    --input junction_chunks/ \
    --output final_junctions.tsv

See docs/PARALLELIZATION.md for complete technical details.

Output Files

Junction Tables

Per-aligner junctions: {aligner}_junctions.tsv

Columns: chrom, five_ss, three_ss, count, five_dinuc, three_dinuc, is_canonical, is_exact_annotated, etc.

Aggregated junctions: consensus_junctions.tsv

Additional columns: num_aligners, aligner_support, confidence, validators

Filtered junctions: high_confidence_junctions.tsv

Subset passing all filtering criteria

Statistics

Aligner comparison: aligner_performance.tsv

Metrics per aligner: total junctions, exact annotated %, novel %

Filtering summary: filtering_stats.tsv

Junction counts at each filtering tier

RPG analysis (if enabled): rpg_vs_nonrpg_summary.tsv

Comparative metrics between RPG and non-RPG introns

Performance

Computational Requirements (yeast genome, 1M reads):

Memory: 16-32 GB RAM
Storage: ~50 GB for intermediate files
Time: 4-8 hours (4 aligners, 8 threads each)

Speed by Aligner:

Minimap2: ~5-10 minutes
uLTRA: ~30-60 minutes
deSALT: ~1-2 hours
GraphMap2: ~4-6 hours (slowest but most accurate)

Optimization:

Run aligners in parallel (separate SLURM jobs)
Use Minimap2 + uLTRA for fast initial analysis
Add GraphMap2 for high-confidence validation

Citation

If you use Nanopore COMPASS, please cite:

Original COMPASS:

Roy KR, et al. (2023) COMPASS: Comparison Of Multiple Alignment Programs for Alternative Splice site discovery. RNA, 29(11):1833-1847. PMID: 37956322

Nanopore Adaptations:

[Your publication using this framework]

Aligners:

Minimap2: Li H. (2018) Bioinformatics 34(18):3094-3100
uLTRA: Sahlin K & Mäkinen V. (2021) Genome Biol 22:142
deSALT: Liu B, et al. (2019) Genome Biol 20:274
GraphMap2: Sović I, et al. (2016) Nat Commun 7:11307

License

[Specify license - consider MIT or GPL to match original COMPASS]

Contact

Kevin R. Roy Larsmon Lab, Stanford University [Contact information]

References

Original COMPASS: https://github.com/k-roy/COMPASS
Non-canonical splice sites: Meyer et al. (2023), PMC10711555
Yeast genome: Saccharomyces Genome Database (SGD)
Nanopore basecalling: Oxford Nanopore Technologies

Version History

v1.0.0 (2026-03-09)

Initial implementation
4-aligner consensus strategy
Nanopore error model
Yeast configuration
RPG analysis module
Alternative splice site detection

Last updated: 2026-03-09

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanocompass-0.1.0.tar.gz (9.4 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nanocompass-0.1.0-py3-none-any.whl (8.8 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file nanocompass-0.1.0.tar.gz.

File metadata

Download URL: nanocompass-0.1.0.tar.gz
Upload date: Mar 20, 2026
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for nanocompass-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c38b1df97d5a04b30b8bb6dcc2d9bb657d805cccbd5736c11663b25f56df4cc5`
MD5	`45f5e56646217a975ff26fd29019b1f3`
BLAKE2b-256	`b62670ab5f73f6015f7ec729c74f3c04020485d1a7be05350aed966ff9c2da92`

See more details on using hashes here.

File details

Details for the file nanocompass-0.1.0-py3-none-any.whl.

File metadata

Download URL: nanocompass-0.1.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for nanocompass-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9ea01b61949f0c5c7bed7931a35a1ed97740bc811984840870e8ebf79b6112cb`
MD5	`395be29fac6a780b0ef716ce5590f70d`
BLAKE2b-256	`ab18d992e3452f6c645ca2b89d9759c771f002dc2d68a0471312aa9c2a83fb17`

See more details on using hashes here.

nanocompass 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Nanopore COMPASS

Overview

Key Features

Original COMPASS

Multi-Aligner Strategy

Installation

Dependencies

Setup

Quick Start

1. Prepare Configuration

2. Run Alignments

3. Extract and Merge Junctions

4. Analyze Results

Filtering Strategy

Multi-Tier Filtering for Alternative Splice Sites

Recommended Thresholds

Project Structure

Configuration

Advanced Usage

RPG (Ribosomal Protein Gene) Analysis

Alternative Splice Site Detection

Ambiguous Junction Resolution

Nanopore-Specific Adaptations

Error-Weighted Scoring

Quality-Based Filtering

Parallel Processing for Large Datasets

Design Principle

Workflow

Usage

Performance

Manual Workflow (Step-by-Step)

Output Files

Junction Tables

Statistics

Performance

Citation

License

Contact

References

Version History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes