Skip to main content

Convert genomic coordinates of contact pairs from one assembly to another.

Project description

pairLiftOver

pairLiftOver is a Python package that converts the two-dimensional genomic coordinates of chromatin contact pairs between assemblies.

pairLiftOver is based on the UCSC chain files. It takes a pairs file or matrix file as input, performs coordinate conversion for each contact pair, and outputs a sorted pairs file or contact matrix with coordinates in another assembly.

Installation

pairLiftOver is developed and tested on UNIX-like operating system, and following packages are required:

  • python 3.7+

  • pairtools 0.3.0

  • cooler

  • pyliftover

  • hic-straw 0.0.6

We recommend using conda to manage these packages. After you have installed conda on your system, execute the commands below:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda create -n pairliftover python pairtools cooler pyliftover cxx-compiler
$ conda activate pairliftover
$ pip install pairLiftOver hic-straw

Data Format

Currently, pairLiftOver supports 4 input data formats: 4DN pairs, allValidPairs, cool, and hic. It is necessary to provide a pairs file (4DN pairs or allValidPairs) to get the most accurate results, however, when such file is not available, pairLiftOver can also operate on contact matrices binned at kilobase resolutions (in cool or hic formats). For hic format, since multiple matrices at various resolutions are stored in a single file, pairLiftOver automatically detects and reads data from the one at the highest resolution.

The default output of pairLiftOver is a sorted pairs file in the standard 4DN pairs format, containing seven columns: “readID”, “chr1”, “pos1”, “chr2”, “pos2”, “strand1”, and “strand2”. However, you can also choose to output a matrix file in cool or hic format by setting the parameter --output-format.

Usage

Open a terminal, type pairLiftOver -h for help information.

Here is an example command which uses a 4DN pairs file in hg19 coordinates as input, and outputs an mcool file with chromatin contacts in hg38 coordinates:

$ pairLiftOver --input test.hg19.pairs.gz --input-format pairs --out-pre test-hg38 \
--output-format cool --out-chromsizes hg38.chrom.sizes --in-assembly hg19 --out-assembly hg38 \
--logFile pairLiftOver.log

Since the version 0.1.3, pairLiftOver has added a function to perform a pure format conversion. For example, the following command transforms a contact matrix from the .cool format to the .hic format, without the coordinate liftover. Note that the values of --in-assembly and --out-assembly need to be the same to turn on this function:

$ pairLiftOver --input Rao2014-K562-MboI-allreps-filtered.5kb.cool --input-format cooler \
--out-pre K562-format-conversion-test --output-format hic --out-chromsizes hg19.chrom.sizes \
--in-assembly hg19 --out-assembly hg19 --memory 40G

Running time and memory usage

The running time of pairLiftOver grows linearly with the number of contact pairs. The memory usage can be roughly controlled by the parameter --memory. In the figure below, pairLiftOver was tested on the downsampled GM12878 Hi-C datasets (Rao 2014) (ranging from 100 million to 1 billion valid pairs). For each run, the memory and the number of processes allocated to pairLiftOver were set to 8Gb (--memory 8G) and 8 (--nproc 8), respectively.

./images/running-time-and-memory.png

Accuracy

So far, pairLiftOver has been tested on datasets of human (Rao 2014, GM12878 and K562), mouse (Rao 2014, CH12-LX) and zebrafish (Yang 2020, brain tissue). And the matrices obtained by pairLiftOver are nearly identical to the re-mapping results at various resolutions.

./images/accuracy.png

Citation

Wang, X., Luan, Y., Yue, F. EagleC: A deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci Adv. 2022.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairLiftOver-0.1.5.tar.gz (27.8 MB view details)

Uploaded Source

Built Distribution

pairLiftOver-0.1.5-py3-none-any.whl (27.8 MB view details)

Uploaded Python 3

File details

Details for the file pairLiftOver-0.1.5.tar.gz.

File metadata

  • Download URL: pairLiftOver-0.1.5.tar.gz
  • Upload date:
  • Size: 27.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for pairLiftOver-0.1.5.tar.gz
Algorithm Hash digest
SHA256 ff3afe676833daa9e1b67cfb51ee7823c905f8810192e245833e55e32353dbc0
MD5 b982c1098ac8cd7cfc68df552070a4dc
BLAKE2b-256 5fa1f7aeb5a7a4a7591c215e6df8f40fe9d180286051a224c4c597389393f961

See more details on using hashes here.

File details

Details for the file pairLiftOver-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pairLiftOver-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 27.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for pairLiftOver-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1709670ad40dfb55c92d882b7abed6e6e8604b79778b79498e1029048e3638a0
MD5 32ef5a5aa757d24e4a6d16874564df49
BLAKE2b-256 003565a48078739b1a0d12b2f2b3d47f7b654499767b22043c509476a0ab9d03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page