Skip to main content

Hi-C analysis, snakemake, sequana, container, reproducibility

Project description

https://badge.fury.io/py/sequana-hic.svg https://github.com/sequana/hic/actions/workflows/main.yml/badge.svg Python 3.11 | 3.12 JOSS (journal of open source software) DOI

This is the Hi-C pipeline from the Sequana project.

Overview:

Hi-C pipeline to capture 3D chromatin interactions in a genome

Input:

Paired FastQ files and a reference genome in FASTA format

Output:

Cooler contact matrices, Hi-C QC reports, and a MultiQC summary

Status:

Production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

If you already have all requirements, install the package with pip:

pip install sequana_hic --upgrade

You will need third-party tools (see Requirements below). Use apptainer images to avoid installing them locally.

Usage

Set up the pipeline directory with your input data and reference:

sequana_hic --input-directory DATAPATH --reference-file genome.fa
sequana_hic --input-directory DATAPATH --reference-file genome.fa --aligner-choice bwa_split

This creates a hic/ directory containing the pipeline and configuration file. Execute the pipeline locally:

cd hic
sh hic.sh

See .sequana/profile/config.yaml to tune Snakemake behaviour (cores, cluster settings, etc.).

Usage with apptainer

With Apptainer, initiate the working directory as follows:

sequana_hic --input-directory DATAPATH --reference-file genome.fa --use-apptainer

Images can be stored in a shared location:

sequana_hic --input-directory DATAPATH --reference-file genome.fa --use-apptainer --apptainer-prefix ~/.sequana/apptainers

then:

cd hic
sh hic.sh

If running Snakemake manually, add apptainer options:

snakemake -s hic.rules --cores 4 --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

By default the home directory is already bound. Additional paths can be set via:

export APPTAINER_BINDPATH="-B /pasteur"

Requirements

This pipeline requires the following executables (install via bioconda/conda):

  • bwa — short-read aligner (default mapper)

  • samtools — BAM/SAM manipulation

  • pairtools — processing of Hi-C read pairs

  • cooler — storage and analysis of Hi-C contact matrices

  • qc3c — Hi-C quality control

  • fastqc — raw read quality control

  • multiqc — aggregate QC reports

Optional:

  • chromap — fast Hi-C aligner (experimental, use --aligner-choice chromap)

  • seqkit — split FastQ files (required for --aligner-choice bwa_split)

https://raw.githubusercontent.com/sequana/hic/main/sequana_pipelines/hic/dag.svg

Pipeline description

  1. FastQC — quality control on raw reads

  2. Reference indexing — BWA index build from the provided FASTA reference

  3. Alignment — BWA-MEM alignment with Hi-C-specific options (-5SP), producing sorted BAM files

  4. Pairtools — parse alignments into Hi-C contact pairs, sort, deduplicate, and split

  5. Cooler — load pairs into a contact matrix and generate multi-resolution .mcool file

  6. qc3C — Hi-C library quality assessment (ligation efficiency, distance distribution)

  7. Visualisation — contact matrix PNG at 5 kb resolution

  8. MultiQC — aggregated QC report

Changelog

Version

Description

0.2.0

Production release.

0.1.0

Migration to modern sequana_pipetools framework (get_shell/get_run, schema validation, apptainer support, Python 3.10+).

0.0.1

First release.

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_hic-0.2.0.tar.gz (136.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_hic-0.2.0-py3-none-any.whl (136.5 kB view details)

Uploaded Python 3

File details

Details for the file sequana_hic-0.2.0.tar.gz.

File metadata

  • Download URL: sequana_hic-0.2.0.tar.gz
  • Upload date:
  • Size: 136.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_hic-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1dbb7f80d83d834453843549299de23202d7971ddb2f16afd96b24c6ac54d709
MD5 1cee7dac7367ad37f0395c67984d0e85
BLAKE2b-256 9ad62afbb1eb438bc89e8201a42bbb81a912d27862a1d4ee83b3d5950acbcefa

See more details on using hashes here.

File details

Details for the file sequana_hic-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sequana_hic-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 136.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_hic-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f37e9ee8774d9acd76ef8106ad7c1b4e6856c56e3828ffe6ed287175c7bfe1c7
MD5 db02a6e4d00ad58cd6a0950977f9b3c9
BLAKE2b-256 087244967477372482ace0940f80ff44a7dd1290a33419210a1d0ba10b56a53a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page