A ChIP-seq pipeline from raw reads to peaks
Project description
This is the chipseq pipeline from the Sequana project.
- Overview:
ChIP-seq pipeline from raw reads to peaks, IDR statistics, and functional annotation
- Input:
Paired or single-end FastQ files and a CSV experimental design file
- Output:
HTML summary report, narrow/broad peak files, IDR statistics, bigwig tracks, annotation tables, and IGV session file
- Status:
Production
- Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352
Installation
pip install sequana_chipseq --upgrade
You will also need the third-party tools listed under Requirements below.
Quick Start
1. Prepare a design file design.csv:
type,condition,replicat,sample_name IP,EXP1,1,IP_EXP1_rep1 IP,EXP1,2,IP_EXP1_rep2 Input,EXP1,1,Input_EXP1
type must be IP (immunoprecipitated) or Input (control).
sample_name must match the prefix of the corresponding FastQ file (e.g. IP_EXP1_rep1 matches IP_EXP1_rep1_R1_.fastq.gz).
At least two IP replicates per condition are required for IDR analysis.
2. Prepare a genome directory named after the genome, containing:
<name>.fa — reference genome FASTA
<name>.gff or <name>.gff3 — gene annotation
Example:
ecoli_MG1655/ ├── ecoli_MG1655.fa └── ecoli_MG1655.gff
3. Set up the pipeline:
sequana_chipseq \
--input-directory DATAPATH \
--genome-directory /path/to/ecoli_MG1655 \
--design-file design.csv
4. Run the pipeline:
cd chipseq sh chipseq.sh
Usage
sequana_chipseq --help
Key pipeline-specific options:
- --genome-directory
Path to the genome directory (must contain <name>.fa and <name>.gff).
- --design-file
CSV experimental design file (see Quick Start above).
- --aligner-choice
Aligner to use. Currently only bowtie2 is supported.
- --blacklist-file
BED3 file of genomic regions to exclude from analysis (tab-separated: chromosome, start, end).
- --genome-size
Effective genome size for macs3 peak calling. Automatically computed from the FASTA file if not provided; override with a plain integer.
- --do-fingerprints
Enable plotFingerprint QC to assess ChIP enrichment quality.
Run on a SLURM cluster:
cd chipseq sbatch chipseq.sh
Or drive Snakemake directly:
snakemake -s chipseq.rules --cores 4 --stats stats.txt
Usage with Apptainer
Run every tool inside pre-built containers — no local tool installation needed:
sequana_chipseq \
--input-directory DATAPATH \
--genome-directory /path/to/genome \
--design-file design.csv \
--use-apptainer
Store images in a shared location to avoid re-downloading:
sequana_chipseq ... --use-apptainer --apptainer-prefix ~/.sequana/apptainers
Then run as usual:
cd chipseq sh chipseq.sh
Requirements
The following tools must be available (install via conda/bioconda):
mamba env create -f environment.yml
bowtie2 — read alignment
fastp — adapter trimming and quality filtering
fastqc — per-read quality control
samtools — BAM sorting, indexing, and flagstat
bedtools — bedGraph generation from BAM files (genomeCoverageBed)
ucsc-bedgraphtobigwig — bedGraph to bigWig conversion (bedGraphToBigWig)
deeptools — fingerprint QC (plotFingerprint) and multi-sample bigwig summary (multiBigwigSummary)
macs3 — narrow and broad peak calling
homer — peak annotation (annotatePeaks.pl)
idr — Irreproducibility Discovery Rate between replicates (installed from sequana/idr fork via pip; the upstream bioconda package is Python 3.10-only)
multiqc — aggregated QC report
Pipeline overview
Trimming — fastp removes low-quality reads and adapters.
QC — FastQC on raw and cleaned reads.
Alignment — bowtie2 maps reads to the reference genome.
[Optional] Mark duplicates — Picard marks PCR duplicates.
[Optional] Blacklist removal — bedtools removes artefact-prone regions.
bigwig — per-sample coverage tracks for genome browsers (bedtools genomeCoverageBed → UCSC bedGraphToBigWig); an IGV session file (igv.xml) is generated to preload all tracks.
[Optional] Fingerprints — plotFingerprint QC to assess ChIP enrichment.
Phantom peak — strand cross-correlation analysis (NSC, RSC, Qtag scores).
Peak calling — macs3 detects narrow and broad peaks for each IP vs Input pair.
FRiP — Fraction of Reads in Peaks per sample and comparison.
IDR — Irreproducibility Discovery Rate on true replicates, pseudo-replicates, and self-pseudo-replicates.
Annotation — homer annotates peaks relative to genomic features.
MultiQC — aggregated QC across all samples.
HTML report — summary with phantom peaks, FRiP plots, IDR tables, and annotation plots.
Configuration
Here is the latest documented configuration file. Key sections:
general — aligner choice and genome directory path
fastp — trimming options (length, quality, adapters)
fastqc — FastQC options and threads
bowtie2_mapping / bowtie2_index — mapping options, threads, memory
macs3 — peak calling parameters (genome size, bandwidth, q-value, broad cutoff)
idr — IDR thresholds, rank metric, number of pseudo-replicates
fingerprints — enable/disable and number of bins
mark_duplicates — enable/disable PCR duplicate marking
remove_blacklist — enable/disable and path to BED blacklist
trimming — enable/disable read trimming and choice of trimming tool
phantom — use SPP (use_spp: true) instead of the built-in sequana phantom-peak detection
igv — enable/disable generation of the IGV session file (igv.xml)
multiqc — MultiQC options
Changelog
Version |
Description |
|---|---|
0.12.0 |
|
0.11.0 |
|
0.10.0 |
|
0.9.1 |
|
0.9.0 |
|
0.8.0 |
First release. |
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequana_chipseq-0.12.0.tar.gz.
File metadata
- Download URL: sequana_chipseq-0.12.0.tar.gz
- Upload date:
- Size: 282.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1952dd7214d2536c534d2d242cc1ee50453d25b2745abb1069d017cb8a6727b7
|
|
| MD5 |
1780a21145717f8cf6a4d8220a9f1659
|
|
| BLAKE2b-256 |
d258a228f18cfa45c5f0ac16dfcdf70bf3a8d30f06e5a1ca0f824323c73d6c2d
|
File details
Details for the file sequana_chipseq-0.12.0-py3-none-any.whl.
File metadata
- Download URL: sequana_chipseq-0.12.0-py3-none-any.whl
- Upload date:
- Size: 281.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4674856ebac2ff024f4e5896a594a8804c9348e8413c65ef7d5f4dd073153e6d
|
|
| MD5 |
1bf08fc2cba2054c857c8ab77f53dae5
|
|
| BLAKE2b-256 |
78c6332546455477f8d4ef9fc6fd2d9c480e104aa56d157432849563a92eba93
|