Skip to main content

A Snakemake pipeline for alternative promoter analysis

Project description

🧬 SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis

SnakeAltPromoter is a Snakemake-based pipeline for streamlined, reproducible, and scalable analysis of alternative promoter usage from RNA-seq or CAGE data. It integrates all major steps—from raw read preprocessing to promoter-level quantification and differential analysis—using state-of-the-art tools.

🧭 Workflow Overview

Below is the schematic overview of the SnakeAltPromoter pipeline, as shown in our paper:

Workflow Overview

Figure 1. Overview of the SnakeAltPromoter pipeline, showing genome setup, RNA-seq/CAGE processing, promoter quantification, classification, and differential promoter usage analysis.


✳️ Key Features

  • End-to-end automation of alternative promoter analysis from FASTQ to results.

  • Integrated QC and preprocessing via FastQC, TrimGalore, STAR, and MultiQC.

  • Supports multiple promoter quantification tools:

  • Built-in differential analysis using DESeq2 and ProActiv modules.

  • Fully modular, reproducible, and scalable—ideal for large multi-sample RNA-seq datasets.


🧩 Installation

1. Install via Conda (Recommended)

# Create and activate environment
conda create -n SnakeAltPromoter -c bioconda SnakeAltPromoter
conda activate SnakeAltPromoter

2. Install via Pip

pip install --user --no-cache-dir SnakeAltPromoter

3. Manual Installation (Build from Source)

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
conda create -n SnakeAltPromoter -c conda-forge python
conda activate SnakeAltPromoter
pip install .

4. Verify Installation

Snakealtpromoter --help

If successful, usage instructions and command-line options will be displayed.


🖥️ Optional: Launch GUI

For a graphical interface:

sap-ui

🧪 Minimal Test Case

Example human chr22 genome fasta and gtf files are available in the snakealtpromoter/data/ directory. Example RNA-seq data are available in the snakealtpromoter/data/ directory.

1. Genome Setup

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Genomesetup \
  --organism hg38 \
  --organism_fasta "$(pwd)/snakealtpromoter/data/hg38.fa" \
  --genes_gtf "$(pwd)/snakealtpromoter/data/hg38.gtf" \
  -o ./genome \
  --threads 30

2. Run Test Analysis

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i "$(pwd)/snakealtpromoter/data" \
  --genome_dir "$(pwd)/genome" \
  -o test_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/samplesheet.tsv"

Output directory structure is described in the documentation:


📊 Reproduce Results from the Paper

To reproduce analyses in the SnakeAltPromoter manuscript:

  1. Download the GENCODE v46 genome FASTA and GTF from GENCODE.
  2. Retrieve heart RNA-seq and CAGE data from GSE147236 using fastq-dump.

1. Genome Setup

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

2. RNA-seq Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_RNAseq/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_RNAseq_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_RNAseq.tsv"

3. CAGE Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_CAGE/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_CAGE_output \
  --threads 30 \
  --organism hg38 \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_CAGE.tsv" \
  --method cage --reads single

4. Compare with Published Results

Supplementary Table Description
Table 1 Comprehensive promoter coordinates
Table 2 Promoter classifications (major/minor) by ProActiv, Salmon, DEXSeq, and CAGE
Table 3 Promoter counts across samples
Table 4 Differential promoter activity (healthy vs. failure) across tools

🚀 Quick Start for your own genome and sequencing data

Step 1. Genome Setup

Prepare genome indices and promoter annotations:

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

Step 2. Process RNA-seq Data

Run alternative promoter analysis:

Snakealtpromoter \
  -i /Absolute/path/to/input_fastqs/ \
  --genome_dir /Absolute/path/to/genome \
  -o ./output/ \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet /Absolute/path/to/samplesheet.tsv \
  --method cage --reads single   # Add these only for CAGE data

For detailed documentation, see:


🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request via GitHub.


🧾 Citation

If you use SnakeAltPromoter, please cite:

Tan J. et al. (2025). SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis. bioRxiv. https://doi.org/10.1101/2025.08.16.669128


⚖️ License

See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakealtpromoter-1.0.4.tar.gz (19.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakealtpromoter-1.0.4-py3-none-any.whl (20.4 MB view details)

Uploaded Python 3

File details

Details for the file snakealtpromoter-1.0.4.tar.gz.

File metadata

  • Download URL: snakealtpromoter-1.0.4.tar.gz
  • Upload date:
  • Size: 19.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for snakealtpromoter-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1668da59c09ab908e656e2131d4fd1e52c7d3d7fe25b30e206a6b9424be77844
MD5 db6867c1aa9915e9b32b447be5b33467
BLAKE2b-256 a89bb6ebf1efe7b26930d4e8e053978165e0d9d2fdddafda995a60f0155081c0

See more details on using hashes here.

File details

Details for the file snakealtpromoter-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for snakealtpromoter-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c51cf4da716c33c9f24e3ae9e1ae376129588da52595db64536aa6a8dba120d3
MD5 9b9298c53e9a17f31dba8872e52158df
BLAKE2b-256 d300e06365871df221a3354e1990e3f53fcad6613828bf696fa7b2c20c3cd718

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page