Skip to main content

A Snakemake pipeline for alternative promoter analysis

Project description

🧬 SnakeAltPromoter: End-to-End Differential Alternative Promoter Analysis

SnakeAltPromoter is a Snakemake-based pipeline for streamlined, reproducible, and scalable analysis of alternative promoter usage from RNA-seq or CAGE data. It integrates all major steps—from raw read preprocessing to promoter-level quantification and differential analysis—using state-of-the-art tools.

🧭 Workflow Overview

Below is the schematic overview of the SnakeAltPromoter pipeline, as shown in our paper:

Workflow Overview

Figure 1. Overview of the SnakeAltPromoter pipeline, showing genome setup, RNA-seq/CAGE processing, promoter quantification, classification, and differential promoter usage analysis.


✳️ Key Features

  • End-to-end automation of alternative promoter analysis from FASTQ to results.

  • Integrated QC and preprocessing via FastQC, TrimGalore, STAR, and MultiQC.

  • Supports multiple promoter quantification tools:

  • Built-in differential analysis using DESeq2 and ProActiv modules.

  • Fully modular, reproducible, and scalable—ideal for large multi-sample RNA-seq datasets.


🧩 Installation

1. Install via Conda (Recommended)

# Create and activate environment
conda create -n SnakeAltPromoter -c bioconda SnakeAltPromoter
conda activate SnakeAltPromoter

2. Install via Pip

pip install SnakeAltPromoter

3. Manual Installation (Build from Source)

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
conda create -n SnakeAltPromoter -c conda-forge python>=3.10
conda activate SnakeAltPromoter
pip install .

4. Verify Installation

Snakealtpromoter --help

If successful, usage instructions and command-line options will be displayed.


🖥️ Optional: Launch GUI

For a graphical interface:

sap-ui

🚀 Quick Start

Step 1. Genome Setup

Prepare genome indices and promoter annotations:

Genomesetup \
  --organism hg38 \
  --organism_fasta /full/path/to/genome.fa \
  --genes_gtf /full/path/to/genes.gtf \
  -o ./genome \
  --threads 30

Step 2. Process RNA-seq Data

Run alternative promoter analysis:

Snakealtpromoter \
  -i /full/path/to/input_fastqs/ \
  --genome_dir ./genome \
  -o ./output/ \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet snakealtpromoter/data/samplesheet/Heart_RNAseq.tsv \
  --method cage --reads single   # Add these only for CAGE data

For detailed documentation, see:


🧪 Minimal Test Case

Example data are available in the snakealtpromoter/data/ directory. Download the GENCODE v46 genome FASTA and GTF from GENCODE.

1. Genome Setup

Genomesetup \
  --organism hg38 \
  --organism_fasta /full/path/to/genome.fa \
  --genes_gtf /full/path/to/genes.gtf \
  -o ./genome \
  --threads 30

2. Run Test Analysis

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i snakealtpromoter/data/ \
  --genome_dir ./genome \
  -o test_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet snakealtpromoter/data/samplesheet/samplesheet.tsv

Output directory structure is described in the documentation:


📊 Reproduce Results from the Paper

To reproduce analyses in the SnakeAltPromoter manuscript:

  1. Download GENCODE v46 genome FASTA and GTF.
  2. Retrieve heart RNA-seq and CAGE data from GSE147236 using fastq-dump.

1. Genome Setup

Genomesetup \
  --organism hg38 \
  --organism_fasta /full/path/to/genome.fa \
  --genes_gtf /full/path/to/genes.gtf \
  -o ./genome \
  --threads 30

2. RNA-seq Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /full/path/to/heart_RNAseq/ \
  --genome_dir ./genome \
  -o heart_RNAseq_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet snakealtpromoter/data/samplesheet/Heart_RNAseq.tsv

3. CAGE Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /full/path/to/heart_CAGE/ \
  --genome_dir ./genome \
  -o heart_CAGE_output \
  --threads 30 \
  --organism hg38 \
  --sample_sheet snakealtpromoter/data/samplesheet/Heart_CAGE.tsv \
  --method cage --reads single

4. Compare with Published Results

Supplementary Table Description
Table 1 Comprehensive promoter coordinates
Table 2 Promoter classifications (major/minor) by ProActiv, Salmon, DEXSeq, and CAGE
Table 3 Promoter counts across samples
Table 4 Differential promoter activity (healthy vs. failure) across tools

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request via GitHub.


🧾 Citation

If you use SnakeAltPromoter, please cite:

Tan J. et al. (2025). SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis. bioRxiv. https://doi.org/10.1101/2025.08.16.669128


⚖️ License

See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakealtpromoter-1.0.0.tar.gz (7.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakealtpromoter-1.0.0-py3-none-any.whl (7.7 MB view details)

Uploaded Python 3

File details

Details for the file snakealtpromoter-1.0.0.tar.gz.

File metadata

  • Download URL: snakealtpromoter-1.0.0.tar.gz
  • Upload date:
  • Size: 7.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for snakealtpromoter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c52fd220d7e3684154015c427ab43a6ce9a07bdcb0a0cad9064da1b74d54c264
MD5 0e5ed64a7b42daa7e13274aaafc90e84
BLAKE2b-256 b062ba043b9842aba27f6b65c53d087089f37f5df3cdea53edd3d22769499d87

See more details on using hashes here.

File details

Details for the file snakealtpromoter-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for snakealtpromoter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35261deab6d5ea5057ddb6998c97752d9a387acf995f0d8df8fdd1e0d354bcf6
MD5 103e03eaaa444ba3ee676a7200a83bcd
BLAKE2b-256 cbbf3c117b3824564b2068e251d2db7f04de35235f68265eaa68816c8538b8ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page