Skip to main content

A Snakemake pipeline for alternative promoter analysis

Project description

🧬 SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis

SnakeAltPromoter is a Snakemake-based pipeline for streamlined, reproducible, and scalable analysis of alternative promoter usage from RNA-seq or CAGE data. It integrates all major steps—from raw read preprocessing to promoter-level quantification and differential analysis—using state-of-the-art tools.

🧭 Workflow Overview

Below is the schematic overview of the SnakeAltPromoter pipeline, as shown in our paper:

Workflow Overview

Figure 1. Overview of the SnakeAltPromoter pipeline, showing genome setup, RNA-seq/CAGE processing, promoter quantification, classification, and differential promoter usage analysis.


✳️ Key Features

  • End-to-end automation of alternative promoter analysis from FASTQ to results.

  • Integrated QC and preprocessing via FastQC, TrimGalore, STAR, and MultiQC.

  • Supports multiple promoter quantification tools:

  • Built-in differential analysis using DESeq2 and ProActiv modules.

  • Fully modular, reproducible, and scalable—ideal for large multi-sample RNA-seq datasets.


🧩 Installation

1. Install via Conda (Recommended)

# Create and activate environment
conda create -n SnakeAltPromoter -c bioconda SnakeAltPromoter
conda activate SnakeAltPromoter

2. Install via Pip

pip install --user --no-cache-dir SnakeAltPromoter

3. Manual Installation (Build from Source)

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
conda create -n SnakeAltPromoter -c conda-forge python
conda activate SnakeAltPromoter
pip install .

4. Verify Installation

Snakealtpromoter --help

If successful, usage instructions and command-line options will be displayed.


🖥️ Optional: Launch GUI

For a graphical interface:

sap-ui

🧪 Minimal Test Case

Example human chr22 genome fasta and gtf files are available in the snakealtpromoter/data/ directory. Example RNA-seq data are available in the snakealtpromoter/data/ directory.

1. Genome Setup

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Genomesetup \
  --organism hg38 \
  --organism_fasta "$(pwd)/snakealtpromoter/data/hg38.fa" \
  --genes_gtf "$(pwd)/snakealtpromoter/data/hg38.gtf" \
  -o ./genome \
  --threads 30

2. Run Test Analysis

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i "$(pwd)/snakealtpromoter/data" \
  --genome_dir "$(pwd)/genome" \
  -o test_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/samplesheet.tsv"

Output directory structure is described in the documentation:


📊 Reproduce Results from the Paper

To reproduce analyses in the SnakeAltPromoter manuscript:

  1. Download the GENCODE v46 genome FASTA and GTF from GENCODE.
  2. Retrieve heart RNA-seq and CAGE data from GSE147236 using fastq-dump.

1. Genome Setup

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

2. RNA-seq Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_RNAseq/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_RNAseq_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_RNAseq.tsv"

3. CAGE Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_CAGE/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_CAGE_output \
  --threads 30 \
  --organism hg38 \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_CAGE.tsv" \
  --method cage --reads single

4. Compare with Published Results

Supplementary Table Description
Table 1 Comprehensive promoter coordinates
Table 2 Promoter classifications (major/minor) by ProActiv, Salmon, DEXSeq, and CAGE
Table 3 Promoter counts across samples
Table 4 Differential promoter activity (healthy vs. failure) across tools

🚀 Quick Start for your own genome and sequencing data

Step 1. Genome Setup

Prepare genome indices and promoter annotations:

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

Step 2. Process RNA-seq Data

Run alternative promoter analysis:

Snakealtpromoter \
  -i /Absolute/path/to/input_fastqs/ \
  --genome_dir /Absolute/path/to/genome \
  -o ./output/ \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet /Absolute/path/to/samplesheet.tsv \
  --method cage --reads single   # Add these only for CAGE data

For detailed documentation, see:


🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request via GitHub.


🧾 Citation

If you use SnakeAltPromoter, please cite:

Tan J. et al. (2025). SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis. bioRxiv. https://doi.org/10.1101/2025.08.16.669128


⚖️ License

See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakealtpromoter-1.0.2.tar.gz (19.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakealtpromoter-1.0.2-py3-none-any.whl (20.3 MB view details)

Uploaded Python 3

File details

Details for the file snakealtpromoter-1.0.2.tar.gz.

File metadata

  • Download URL: snakealtpromoter-1.0.2.tar.gz
  • Upload date:
  • Size: 19.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for snakealtpromoter-1.0.2.tar.gz
Algorithm Hash digest
SHA256 c3fce40bcd85f4b759fef18e249c7294b47ae83e03f34ae10ad708410810f9ab
MD5 dd0b83d50109ffab2ecca63ba80a095c
BLAKE2b-256 c05ba81e4b072e6fdf317198ecb09ae71b8becd2b77825299f11cb190e4b9838

See more details on using hashes here.

File details

Details for the file snakealtpromoter-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for snakealtpromoter-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 de98f4bd3fafdbb1824cdd090eac56e79264751e636256e3f48e0e060fd27a63
MD5 ba54687a2ebdc4f70bc479e6d92c12f2
BLAKE2b-256 960dd1defb56ea27a548e0adb3b7e0a308f13c2dd2cec8893b475e907c7480ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page