Rank Ordering of Super-Enhancers - A tool for identifying super-enhancers from ChIP-seq data
Project description
ROSE2: Rank Ordering of Super-Enhancers
A fast, modern tool for identifying super-enhancers and their target genes from ChIP-seq data.
What Problem Does ROSE2 Solve?
The Biological Challenge
Cells maintain their identity and function through gene regulatory networks controlled by enhancers—DNA elements that activate gene expression. Among these, super-enhancers are especially important:
- Cell identity genes: Super-enhancers drive the expression of genes that define cell type (e.g., MYC in cancer cells, OCT4 in stem cells)
- Disease-associated genes: Many disease genes are controlled by super-enhancers, making them therapeutic targets
- Master regulators: Super-enhancers regulate transcription factors that control entire gene programs
However, identifying which genomic regions are super-enhancers is challenging because:
- Enhancers are scattered across the genome, often far from their target genes
- Multiple enhancer elements often work together as clusters
- Standard analysis treats each enhancer separately, missing the biological reality of collaborative regulation
Why Super-Enhancers Matter
Super-enhancers are:
🎯 Therapeutic targets: BET inhibitors and other drugs target super-enhancer components 🔬 Biomarkers: Super-enhancer landscapes distinguish cell types and disease states 🧬 Mechanistic insights: Reveal how transcription factors and cofactors coordinate gene expression 💊 Drug development: Understanding super-enhancers helps design selective therapies
Example applications:
- Cancer biology: Identify oncogene-driving super-enhancers (MYC, NOTCH1, RUNX1)
- Stem cell research: Map pluripotency-associated super-enhancers (OCT4, NANOG, SOX2)
- Immunology: Discover immune cell identity super-enhancers (CD4, IL2RA, FOXP3)
- Development: Track super-enhancer dynamics during differentiation
How ROSE2 Solves It
ROSE2 identifies super-enhancers through a biologically-motivated algorithm:
1. Stitch nearby enhancers (the biological reality)
- Combines enhancer peaks within 12.5kb (default)
- Reflects how enhancers physically cluster in 3D chromatin space
- Creates "stitched regions" representing functional enhancer units
2. Rank by regulatory activity (quantitative measurement)
- Calculates ChIP-seq signal density for each stitched region
- Accounts for input/control background
- Normalizes for region size
3. Identify super-enhancers (data-driven threshold)
- Plots signal distribution (hockey stick curve)
- Super-enhancers are the inflection point: exceptionally high signal
- Typically the top 5-10% of stitched enhancers
4. Map to target genes (biological function)
- Links enhancers to nearby genes (overlapping, proximal, or closest)
- Enables functional interpretation
- Identifies putative regulatory targets
Input: ChIP-seq data (BAM files) + peak calls (BED/GFF) for enhancer-associated factors
- Commonly used: H3K27ac (active enhancer mark), BRD4, MED1, p300
- Works with any ChIP-seq data for enhancer-binding factors
Output: Ranked enhancer list, super-enhancer calls, gene assignments, plots
📖 For algorithmic details, see TECHNICAL_NOTES.md
Installation
Quick Install (Recommended)
pip install rose2
System Requirements
Required:
Optional but recommended:
- bedtools (≥ 2) - for format conversions
Install dependencies:
Ubuntu/Debian:
sudo apt-get install samtools bedtools r-base
macOS (Homebrew):
brew install samtools bedtools r
Conda:
conda install -c bioconda samtools bedtools r-base
Quick Start
Example: Identify H3K27ac Super-Enhancers
# Basic usage
rose2 main \
-i H3K27ac_peaks.bed \
-r H3K27ac.bam \
-g HG38 \
-o results/
# With input control
rose2 main \
-i H3K27ac_peaks.bed \
-r H3K27ac.bam \
-c Input.bam \
-g HG38 \
-o results/
# Custom stitching distance (e.g., 20kb)
rose2 main \
-i BRD4_peaks.bed \
-r BRD4.bam \
-g MM10 \
-s 20000 \
-o results/
Input Files
1. Peak calls (BED, narrowPeak, or GFF format):
chr1 1000 2000 peak1 100 .
chr1 5000 6000 peak2 150 .
2. Aligned reads (sorted, indexed BAM file):
samtools sort input.bam -o sorted.bam
samtools index sorted.bam
3. Genome annotation: Built-in support for HG18, HG19, HG38, MM8, MM9, MM10
Output Files
ROSE2 generates several files in your output directory:
Core Results
1. Super-enhancer table (*_SuperStitched.table.txt)
- List of identified super-enhancers
- Genomic coordinates, signal strength, nearby genes
- Key for downstream analysis
2. All stitched enhancers (*_AllStitched.table.txt)
- Complete ranked list of all stitched regions
- Allows custom thresholding
3. Hockey stick plot (*_Plot_points.png)
- Visual identification of super-enhancers
- Shows signal distribution and threshold
Gene Mapping Results
4. Enhancer-to-gene mapping (*_REGION_TO_GENE.txt)
- Each row = one enhancer
- Shows overlapping, proximal, and closest genes
5. Gene-to-enhancer mapping (*_GENE_TO_REGION.txt)
- Each row = one gene
- Lists all associated enhancers
6. Signal with genes (*.table_withGENES.txt)
- Combines enhancer signal with gene assignments
- Ready for gene set enrichment analysis
Typical Workflows
1. Basic Super-Enhancer Discovery
Goal: Identify super-enhancers in your cell type
# Run ROSE2
rose2 main -i peaks.bed -r H3K27ac.bam -g HG38 -o results/
# Analyze results
# - Check hockey stick plot for super-enhancer threshold
# - Review super-enhancer gene list for cell identity genes
# - Compare to other cell types or conditions
2. Compare Conditions (e.g., Cancer vs Normal)
# Run on both conditions
rose2 main -i cancer_peaks.bed -r cancer.bam -g HG38 -o cancer_SE/
rose2 main -i normal_peaks.bed -r normal.bam -g HG38 -o normal_SE/
# Compare outputs
# - Identify cancer-specific super-enhancers
# - Check for oncogene associations (MYC, NOTCH1, etc.)
# - Analyze gained/lost super-enhancers
3. Time Course (e.g., Differentiation)
# Run on each timepoint
for day in day0 day2 day4 day7; do
rose2 main -i ${day}_peaks.bed -r ${day}.bam -g MM10 -o ${day}_SE/
done
# Track super-enhancer dynamics
# - Identify stage-specific super-enhancers
# - Map to developmental regulators
# - Build regulatory trajectories
4. Multi-Factor Analysis (e.g., BRD4 + H3K27ac)
# Run with different ChIP targets
rose2 main -i peaks.bed -r H3K27ac.bam -g HG38 -o H3K27ac_SE/
rose2 main -i peaks.bed -r BRD4.bam -g HG38 -o BRD4_SE/
rose2 main -i peaks.bed -r MED1.bam -g HG38 -o MED1_SE/
# Compare factor occupancy
# - Identify co-occupied super-enhancers
# - Assess factor dependency
# - Predict drug sensitivity
Interpreting Results
Understanding the Hockey Stick Plot
The plot shows cumulative ChIP-seq signal vs. rank:
- X-axis: Enhancers ranked by signal (1 = strongest)
- Y-axis: Cumulative ChIP-seq signal
- Inflection point: Where the curve "bends" sharply upward
- Super-enhancers: Regions above the inflection point (typically top 5-10%)
What to look for:
- Clear separation between typical and super-enhancers
- Super-enhancers should be 5-10× stronger than typical enhancers
- Too many super-enhancers? Increase stitching distance
- Too few? Decrease stitching distance or check data quality
Gene Assignment Strategy
ROSE2 assigns genes using a proximity-based hierarchy:
- Overlapping genes: Gene body overlaps the enhancer (distance = 0)
- Proximal genes: TSS within 50kb of enhancer boundary
- Closest gene: Nearest TSS if no overlapping/proximal genes found
Biological interpretation:
- Overlapping = intragenic enhancer (common for large genes)
- Proximal = typical enhancer-promoter distance
- Distal = long-range regulation (validate with Hi-C/3C data)
Validating Super-Enhancers
Computational validation:
- Check for cell identity genes (expected master regulators)
- Compare to published super-enhancers in similar cell types
- Verify enrichment at known regulatory loci
Experimental validation:
- CRISPR deletion of super-enhancer regions
- BET inhibitor treatment (should reduce super-enhancer activity)
- 3C/Hi-C to confirm enhancer-promoter interactions
- RNA-seq after enhancer perturbation
Advanced Usage
Custom Parameters
rose2 main \
-i peaks.bed \
-r sample.bam \
-c input.bam \
-g HG38 \
-s 12500 \ # Stitching distance (default: 12.5kb)
-t 2500 \ # TSS exclusion zone (default: 2.5kb)
-o output/ \
--mask blacklist.bed # Exclude problematic regions
Parameter guidelines:
-
Stitching distance (-s): Larger = fewer, bigger super-enhancers
- 12.5kb (default): Standard for most analyses
- 20kb: For very dense enhancer regions
- 5kb: For sparse enhancer landscapes
-
TSS exclusion (-t): Exclude promoter-proximal regions
- 2.5kb (default): Removes promoters while keeping enhancers
- 0: Include all regions (not recommended)
- 5kb: More stringent enhancer-only analysis
Custom Genome Annotation
# Use your own gene annotation
rose2 main \
-i peaks.bed \
-r sample.bam \
--custom my_annotation.ucsc \
-o output/
Annotation format (UCSC refGene format):
585 NR_046018 chr1 + 11873 14409 14409 14409 3 11873,12612,13220, 12227,12721,14409, 0 DDX11L1 unk unk -1,-1,-1,
Gene List Filtering
# Map only to specific genes of interest
rose2-geneMapper \
-i SuperStitched.table.txt \
-g HG38 \
-l my_genes.txt \ # One gene per line
-o filtered_mapping/
Use cases:
- Focus on known oncogenes/tumor suppressors
- Analyze specific pathways (e.g., immune genes)
- Validate predictions for candidate genes
Performance
ROSE2 v2.0 is dramatically faster than previous versions:
| Dataset | Previous Version | ROSE2 v2.0 | Speedup |
|---|---|---|---|
| 22K enhancers, HG38 | ~9 hours | ~12 minutes | 45× |
| Gene mapping | 6 hours | 12 seconds | 1,772× |
| Memory usage | 5 GB | 500 MB | 90% reduction |
Optimizations:
- Interval-based coverage calculation (500× faster)
- Smart gene search algorithm (eliminates billions of redundant checks)
- Streaming data processing (90% memory reduction)
- See CHANGELOG.md for details
Troubleshooting
Common Issues
1. "samtools not found"
# Install samtools
conda install -c bioconda samtools
# OR
brew install samtools # macOS
sudo apt-get install samtools # Ubuntu
2. "BAM file not indexed"
samtools index your_file.bam
# Creates your_file.bam.bai
3. "No super-enhancers found"
- Check data quality (coverage, signal-to-noise)
- Verify peak calls are reasonable
- Try adjusting stitching distance
- Ensure you're using enhancer marks (H3K27ac, not H3K4me3)
4. "Wrong chromosome naming (chr1 vs 1)"
- ROSE2 automatically detects and handles this
- If issues persist, check BAM header:
samtools view -H your.bam
5. "Out of memory"
- ROSE2 v2.0 uses 90% less memory than before
- If still issues, process smaller chromosome regions separately
Citation
If you use ROSE2 in your research, please cite:
Original ROSE algorithm:
Whyte WA, Orlando DA, Hnisz D, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153(2):307-319. doi:10.1016/j.cell.2013.03.035
ROSE2 modernization and optimization:
Tang M (2025). ROSE2: High-performance super-enhancer identification for Python 3. https://github.com/stjude/ROSE2
Credits
Original Algorithm: Richard Young Lab, Whitehead Institute Python 3 Port: St. Jude Children's Research Hospital, Abra Lab Modernization & Optimization: Ming (Tommy) Tang
- Modern Python packaging and PyPI distribution
- 1,700× performance improvements
- 90% memory reduction
- Comprehensive documentation and testing
Related Tools
- HOMER - Motif analysis and peak annotation
- GREAT - Functional enrichment of regulatory regions
- deepTools - ChIP-seq quality control and visualization
- ChromHMM - Chromatin state discovery and characterization
Getting Help
- Documentation: Full guide
- Issues: GitHub Issues
- Email: tangming2005@gmail.com
License
Apache License 2.0 - See LICENSE.txt for details.
Changelog
See CHANGELOG.md for version history and detailed changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyrose2-2.0.0.tar.gz.
File metadata
- Download URL: pyrose2-2.0.0.tar.gz
- Upload date:
- Size: 16.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83f24bd1cbee056a325447c0c94e4bd75350053b9419d47903a2d905b9f6ace2
|
|
| MD5 |
9b04822012287560f0f42ca2c4ad997b
|
|
| BLAKE2b-256 |
8268a0de665c1e5f790a9b5c49294416398ced920b55e8d0fd1ea774e7cca6c8
|
File details
Details for the file pyrose2-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pyrose2-2.0.0-py3-none-any.whl
- Upload date:
- Size: 16.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6979c73732c2df8fedd42f076fc3fff5279b5418911ffd6d8c0cf0aa2742bdc0
|
|
| MD5 |
cb0d57a48b278b7439df44a2555ffbee
|
|
| BLAKE2b-256 |
4a53885ff6be2ed84a121471f782bddf9f23002fa91522ce09a6fb2787f5a836
|