Skip to main content

A Snakemake pipeline for alternative promoter analysis

Project description

🧬 SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis

SnakeAltPromoter is a Snakemake-based pipeline for streamlined, reproducible, and scalable analysis of alternative promoter usage from RNA-seq or CAGE data. It integrates all major steps—from raw read preprocessing to promoter-level quantification and differential analysis—using state-of-the-art tools.

🧭 Workflow Overview

Below is the schematic overview of the SnakeAltPromoter pipeline, as shown in our paper:

Workflow Overview

Figure 1. Overview of the SnakeAltPromoter pipeline, showing genome setup, RNA-seq/CAGE processing, promoter quantification, classification, and differential promoter usage analysis.


✳️ Key Features

  • End-to-end automation of alternative promoter analysis from FASTQ to results.

  • Integrated QC and preprocessing via FastQC, TrimGalore, STAR, and MultiQC.

  • Supports multiple promoter quantification tools:

  • Built-in differential analysis using DESeq2 and ProActiv modules.

  • Fully modular, reproducible, and scalable—ideal for large multi-sample RNA-seq datasets.


🧩 Installation

1. Install via Conda (Recommended)

# Create and activate environment
conda create -n SnakeAltPromoter -c bioconda SnakeAltPromoter
conda activate SnakeAltPromoter

2. Install via Pip

pip install --user --no-cache-dir SnakeAltPromoter

3. Manual Installation (Build from Source)

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
conda create -n SnakeAltPromoter -c conda-forge python
conda activate SnakeAltPromoter
pip install .

4. Verify Installation

Snakealtpromoter --help

If successful, usage instructions and command-line options will be displayed.


🖥️ Optional: Launch GUI

For a graphical interface:

sap-ui

🧪 Minimal Test Case

Example human chr22 genome fasta and gtf files are available in the snakealtpromoter/data/ directory. Example RNA-seq data are available in the snakealtpromoter/data/ directory.

1. Genome Setup

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Genomesetup \
  --organism hg38 \
  --organism_fasta "$(pwd)/snakealtpromoter/data/hg38.fa" \
  --genes_gtf "$(pwd)/snakealtpromoter/data/hg38.gtf" \
  -o ./genome \
  --threads 30

2. Run Test Analysis

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i "$(pwd)/snakealtpromoter/data" \
  --genome_dir "$(pwd)/genome" \
  -o test_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/samplesheet.tsv"

Output directory structure is described in the documentation:


📊 Reproduce Results from the Paper

To reproduce analyses in the SnakeAltPromoter manuscript:

  1. Download the GENCODE v46 genome FASTA and GTF from GENCODE.
  2. Retrieve heart RNA-seq and CAGE data from GSE147236 using fastq-dump.

1. Genome Setup

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

2. RNA-seq Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_RNAseq/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_RNAseq_output \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_RNAseq.tsv"

3. CAGE Processing

git clone https://github.com/YidanSunResearchLab/SnakeAltPromoter.git
cd SnakeAltPromoter
Snakealtpromoter \
  -i /Absolute/path/to/heart_CAGE/ \
  --genome_dir /Absolute/path/to/genome \
  -o heart_CAGE_output \
  --threads 30 \
  --organism hg38 \
  --sample_sheet "$(pwd)/snakealtpromoter/data/samplesheet/Heart_CAGE.tsv" \
  --method cage --reads single

4. Compare with Published Results

Supplementary Table Description
Table 1 Comprehensive promoter coordinates
Table 2 Promoter classifications (major/minor) by ProActiv, Salmon, DEXSeq, and CAGE
Table 3 Promoter counts across samples
Table 4 Differential promoter activity (healthy vs. failure) across tools

🚀 Quick Start for your own genome and sequencing data

Step 1. Genome Setup

Prepare genome indices and promoter annotations:

Genomesetup \
  --organism hg38 \
  --organism_fasta /Absolute/path/to/genome.fa \
  --genes_gtf /Absolute/path/to/genes.gtf \
  -o /Absolute/path/to/genome \
  --threads 30

Step 2. Process RNA-seq Data

Run alternative promoter analysis:

Snakealtpromoter \
  -i /Absolute/path/to/input_fastqs/ \
  --genome_dir /Absolute/path/to/genome \
  -o ./output/ \
  --threads 30 \
  --organism hg38 --trim \
  --sample_sheet /Absolute/path/to/samplesheet.tsv \
  --method cage --reads single   # Add these only for CAGE data

For detailed documentation, see:


🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request via GitHub.


🧾 Citation

If you use SnakeAltPromoter, please cite:

Tan J. et al. (2025). SnakeAltPromoter Facilitates Differential Alternative Promoter Analysis. bioRxiv. https://doi.org/10.1101/2025.08.16.669128


⚖️ License

See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakealtpromoter-1.0.5.tar.gz (19.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakealtpromoter-1.0.5-py3-none-any.whl (20.4 MB view details)

Uploaded Python 3

File details

Details for the file snakealtpromoter-1.0.5.tar.gz.

File metadata

  • Download URL: snakealtpromoter-1.0.5.tar.gz
  • Upload date:
  • Size: 19.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for snakealtpromoter-1.0.5.tar.gz
Algorithm Hash digest
SHA256 b3f8f436490607c86678b7dc04288782e729fd775a2057bb106fe9961cf93832
MD5 4fb60cb269207b53c0e02e7423409579
BLAKE2b-256 09467f01d78990eb519e2e53f56004b0a2d55425b8f50031b00548e0413e86f3

See more details on using hashes here.

File details

Details for the file snakealtpromoter-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for snakealtpromoter-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9e177a156609c3ddf4c9d2efdb21a29ebd30a5eadac0277f7e21d2d0c9f88226
MD5 44ad2845d91749a4950a10d669601ab5
BLAKE2b-256 7b7322837ed4ab2a1fb413d63e85640b89e1f137f3b2dfdaf6bf21e6750478e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page