sequana-chipseq

A ChIP-seq pipeline from raw reads to peaks

These details have not been verified by PyPI

Project links

Project description

https://badge.fury.io/py/sequana-chipseq.svg

https://github.com/sequana/chipseq/actions/workflows/main.yml/badge.svg

JOSS (journal of open source software) DOI

This is the chipseq pipeline from the Sequana project.

Overview:: ChIP-seq pipeline from raw reads to peaks, IDR statistics, and functional annotation
Input:: Paired or single-end FastQ files and a CSV experimental design file
Output:: HTML summary report, narrow/broad peak files, IDR statistics, bigwig tracks, annotation tables, and IGV session file
Status:: Production
Citation:: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352

sequana_pipelines/chipseq/dag_complete.png

Installation

pip install sequana_chipseq --upgrade

You will also need the third-party tools listed under Requirements below.

Quick Start

1. Prepare a design file design.csv:

type,condition,replicat,sample_name
IP,EXP1,1,IP_EXP1_rep1
IP,EXP1,2,IP_EXP1_rep2
Input,EXP1,1,Input_EXP1

type must be IP (immunoprecipitated) or Input (control).
sample_name must match the prefix of the corresponding FastQ file (e.g. IP_EXP1_rep1 matches IP_EXP1_rep1_R1_.fastq.gz).
At least two IP replicates per condition are required for IDR analysis.

2. Prepare a genome directory named after the genome, containing:

<name>.fa — reference genome FASTA
<name>.gff or <name>.gff3 — gene annotation

Example:

ecoli_MG1655/
├── ecoli_MG1655.fa
└── ecoli_MG1655.gff

3. Set up the pipeline:

sequana_chipseq \
    --input-directory DATAPATH \
    --genome-directory /path/to/ecoli_MG1655 \
    --design-file design.csv

4. Run the pipeline:

cd chipseq
sh chipseq.sh

Usage

sequana_chipseq --help

Key pipeline-specific options:

--genome-directory: Path to the genome directory (must contain <name>.fa and <name>.gff).
--design-file: CSV experimental design file (see Quick Start above).
--aligner-choice: Aligner to use. Currently only bowtie2 is supported.
--blacklist-file: BED3 file of genomic regions to exclude from analysis (tab-separated: chromosome, start, end).
--genome-size: Effective genome size for macs3 peak calling. Automatically computed from the FASTA file if not provided; override with a plain integer.
--do-fingerprints: Enable plotFingerprint QC to assess ChIP enrichment quality.

Run on a SLURM cluster:

cd chipseq
sbatch chipseq.sh

Or drive Snakemake directly:

snakemake -s chipseq.rules --cores 4 --stats stats.txt

Usage with Apptainer

Run every tool inside pre-built containers — no local tool installation needed:

sequana_chipseq \
    --input-directory DATAPATH \
    --genome-directory /path/to/genome \
    --design-file design.csv \
    --use-apptainer

Store images in a shared location to avoid re-downloading:

sequana_chipseq ... --use-apptainer --apptainer-prefix ~/.sequana/apptainers

Then run as usual:

cd chipseq
sh chipseq.sh

Requirements

The following tools must be available (install via conda/bioconda):

mamba env create -f environment.yml

bowtie2 — read alignment
fastp — adapter trimming and quality filtering
fastqc — per-read quality control
samtools — BAM sorting, indexing, and flagstat
bedtools — bedGraph generation from BAM files (genomeCoverageBed)
ucsc-bedgraphtobigwig — bedGraph to bigWig conversion (bedGraphToBigWig)
deeptools — fingerprint QC (plotFingerprint) and multi-sample bigwig summary (multiBigwigSummary)
macs3 — narrow and broad peak calling
homer — peak annotation (annotatePeaks.pl)
idr — Irreproducibility Discovery Rate between replicates (installed from sequana/idr fork via pip; the upstream bioconda package is Python 3.10-only)
multiqc — aggregated QC report

Pipeline overview

Trimming — fastp removes low-quality reads and adapters.
QC — FastQC on raw and cleaned reads.
Alignment — bowtie2 maps reads to the reference genome.
[Optional] Mark duplicates — Picard marks PCR duplicates.
[Optional] Blacklist removal — bedtools removes artefact-prone regions.
bigwig — per-sample coverage tracks for genome browsers (bedtools genomeCoverageBed → UCSC bedGraphToBigWig); an IGV session file (igv.xml) is generated to preload all tracks.
[Optional] Fingerprints — plotFingerprint QC to assess ChIP enrichment.
Phantom peak — strand cross-correlation analysis (NSC, RSC, Qtag scores).
Peak calling — macs3 detects narrow and broad peaks for each IP vs Input pair.
FRiP — Fraction of Reads in Peaks per sample and comparison.
IDR — Irreproducibility Discovery Rate on true replicates, pseudo-replicates, and self-pseudo-replicates.
Annotation — homer annotates peaks relative to genomic features.
MultiQC — aggregated QC across all samples.
HTML report — summary with phantom peaks, FRiP plots, IDR tables, and annotation plots.

Configuration

Here is the latest documented configuration file. Key sections:

general — aligner choice and genome directory path
fastp — trimming options (length, quality, adapters)
fastqc — FastQC options and threads
bowtie2_mapping / bowtie2_index — mapping options, threads, memory
macs3 — peak calling parameters (genome size, bandwidth, q-value, broad cutoff)
idr — IDR thresholds, rank metric, number of pseudo-replicates
fingerprints — enable/disable and number of bins
mark_duplicates — enable/disable PCR duplicate marking
remove_blacklist — enable/disable and path to BED blacklist
trimming — enable/disable read trimming and choice of trimming tool
phantom — use SPP (use_spp: true) instead of the built-in sequana phantom-peak detection
igv — enable/disable generation of the IGV session file (igv.xml)
multiqc — MultiQC options

Changelog

Version	Description
0.12.0	Fix macs3, self_pseudo_replicate_peaks, and pseudo_replicate_peaks rules: macs3 exits non-zero on sparse CI data; added \|\| true + conditional touch so the pipeline continues and downstream rules handle empty peak files gracefully Add container: sequana_tools to all macs3 rules so peak calling runs consistently inside the apptainer container Replace bioconda idr with pip install from sequana/idr fork; fixes CI failures on Python 3.11/3.12 (upstream package is Python 3.10-only due to Cython 3.x incompatibility) Fix plot_FRiP: was iterating over all comparisons inside each rule invocation causing FileNotFoundError in parallel runs; now processes only its own wildcard Fix IDR rules (idr_NT, self_pseudo_replicate_idr, pseudo_replicate_idr): IDR exits non-zero on sparse data; added \|\| true + conditional mv so the pipeline continues and downstream Python rules handle empty results gracefully peaks and Homer returns an empty DataFrame Fix fastp rule: use input.fastq / output.r1 / output.r2 to match the sequana-wrappers fastp shell interface; split into paired/single-end branches Add log: directives and stderr redirection to rules that were missing them: phantom_align, chrom_sizes, fingerprints, bam_to_bed, bed_to_bigwig, pseudo_replicate_idr Update sequana_tools container to 26.1.14 Update CI: Python 3.10/3.11/3.12; actions/checkout@v4
0.11.0	Switch to click and new sequana_pipetools
0.10.0	Fix design in case of samples that start with the same prefix Include final IDR plots and tables Fix containers and wrappers in the config file Better HTML report
0.9.1	Fix requirements and setup.py (remove wrong idr package)
0.9.0	Use latest wrappers and apptainer (for rulegraph)
0.8.0	First release.

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.12.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_chipseq-0.12.0.tar.gz (282.5 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sequana_chipseq-0.12.0-py3-none-any.whl (281.1 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file sequana_chipseq-0.12.0.tar.gz.

File metadata

Download URL: sequana_chipseq-0.12.0.tar.gz
Upload date: Apr 3, 2026
Size: 282.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_chipseq-0.12.0.tar.gz
Algorithm	Hash digest
SHA256	`1952dd7214d2536c534d2d242cc1ee50453d25b2745abb1069d017cb8a6727b7`
MD5	`1780a21145717f8cf6a4d8220a9f1659`
BLAKE2b-256	`d258a228f18cfa45c5f0ac16dfcdf70bf3a8d30f06e5a1ca0f824323c73d6c2d`

See more details on using hashes here.

File details

Details for the file sequana_chipseq-0.12.0-py3-none-any.whl.

File metadata

Download URL: sequana_chipseq-0.12.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 281.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_chipseq-0.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4674856ebac2ff024f4e5896a594a8804c9348e8413c65ef7d5f4dd073153e6d`
MD5	`1bf08fc2cba2054c857c8ab77f53dae5`
BLAKE2b-256	`78c6332546455477f8d4ef9fc6fd2d9c480e104aa56d157432849563a92eba93`

See more details on using hashes here.

sequana-chipseq 0.12.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Quick Start

Usage

Usage with Apptainer

Requirements

Pipeline overview

Configuration

Changelog

Contribute & Code of Conduct

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes