Skip to main content

Convert genomic coordinates of contact pairs from one assembly to another.

Project description

HiCLift

With the continuous effort to improve the quality of human reference genome and the generation of more and more personal genomes, the conversion of genomic coordinates between genome assemblies is critical in many integrative and comparative studies. While tools have been developed for such task for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for chromatin interaction data, despite the importance of three-dimensional (3D) genome organization in gene regulation and disease. Here, we present HiCLift (previously known as pairLiftOver), a fast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C and Micro-C from one assembly to another, including the latest T2T genome. Comparing with the strategy of directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times faster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as HiCLift does not need to re-map the raw reads, it can directly convert human patient sample data, where the raw sequencing reads are sometimes hard to acquire or not available.

Installation

HiCLift and all the dependencies can be installed through either mamba or pip:

$ conda config --append channels defaults
$ conda config --append channels bioconda
$ conda config --append channels conda-forge
$ mamba create -n HiCLift cooler pairtools kerneltree cxx-compiler
$ mamba activate HiCLift
$ pip install HiCLift hic-straw

Overview

The inputs to HiCLift include two parts. The first part is a file containing the chromatin contacts information. This file can be either a pairs file (4DN pairs or HiC-Pro allValidPairs) with each row representing a pair of interacting genomic loci in base-pair resolution, or a matrix file (.cool or .hic), which stores interaction frequencies between genomic intervals of fixed size. The second part is a UCSC chain file, which describes pairwise alignment that allows gaps in both assemblies simultaneously. Internally, HiCLift represents a chain file as IntervalTrees, with one tree per chromosome, to efficiently search for a specific genomic position in a chain file and locate the matched position in the target genome. The converted chromatin contacts will be reported in either a sorted 4DN pairs file, which can be directly used to generate contact matrix in various formats, or a matrix file in .cool or .hic formats.

./images/fig1.svg.png

Usage

Open a terminal, type HiCLift -h for help information.

Here is an example command which uses a 4DN pairs file in hg19 coordinates as input, and outputs an mcool file with chromatin contacts in hg38 coordinates:

$ HiCLift --input test.hg19.pairs.gz --input-format pairs --out-pre test-hg38 \
--output-format cool --out-chromsizes hg38.chrom.sizes --in-assembly hg19 --out-assembly hg38 \
--logFile HiCLift.log

HiCLift can also serve as a tool to perform a pure data format conversion. For example, the following command transforms a contact matrix from the .cool format to the .hic format, without the coordinate liftover. Note that the values of --in-assembly and --out-assembly need to be the same to turn on this function:

$ HiCLift --input Rao2014-K562-MboI-allreps-filtered.5kb.cool --input-format cooler \
--out-pre K562-format-conversion-test --output-format hic --out-chromsizes hg19.chrom.sizes \
--in-assembly hg19 --out-assembly hg19 --memory 40G

Performance

Using large Hi-C datasets of different species as a benchmark, we show that compared with the strategy directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times faster, while outputs nearly identical contact matrices.

./images/accuracy.png

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HiCLift-1.0.tar.gz (27.7 MB view details)

Uploaded Source

Built Distribution

HiCLift-1.0-py3-none-any.whl (27.8 MB view details)

Uploaded Python 3

File details

Details for the file HiCLift-1.0.tar.gz.

File metadata

  • Download URL: HiCLift-1.0.tar.gz
  • Upload date:
  • Size: 27.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for HiCLift-1.0.tar.gz
Algorithm Hash digest
SHA256 3573dc51894b1c9a44fcc448896912864fb7288d37ec5a5a861a756d10be1b0c
MD5 05a68fabe820c0ee32741f9cd0cef569
BLAKE2b-256 7ef87aff3c8913e3e127c23aaca6609caa079f15bd131b842c61ff69ab2eeb8a

See more details on using hashes here.

File details

Details for the file HiCLift-1.0-py3-none-any.whl.

File metadata

  • Download URL: HiCLift-1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for HiCLift-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f9b72937565cac586b3baed7504135a193ce57cfa76830997e8464e32848662a
MD5 aec31e90ecb04a058770614bd8d3dfad
BLAKE2b-256 420907c3158339644b06d0fcf269477a8e1b5f19af1e7742c9f90fa97c344325

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page