Convert genomic coordinates of contact pairs from one assembly to another.
Project description
pairLiftOver
pairLiftOver is a Python package that converts the two-dimensional genomic coordinates of chromatin contact pairs between assemblies.
pairLiftOver is based on the UCSC chain files. It takes a pairs file or matrix file as input, performs coordinate conversion for each contact pair, and outputs a sorted pairs file or contact matrix with coordinates in another assembly.
Installation
pairLiftOver is developed and tested on UNIX-like operating system, and following packages are required:
python 3.6+
cooler 0.8.6
pairtools 0.3.0
pyliftover
hic-straw
We recommend using conda to manage these packages. After you have installed conda on your system, execute the commands below:
$ conda config --add channels defaults $ conda config --add channels bioconda $ conda config --add channels conda-forge $ conda create -n pairliftover python=3.7 pairtools=0.3.0 cooler=0.8.6 pyliftover $ conda activate pairliftover $ pip install pairLiftOver hic-straw
Data Format
Currently, pairLiftOver supports 4 input data formats: 4DN pairs, allValidPairs, cool, and hic. It is necessary to provide a pairs file (4DN pairs or allValidPairs) to get the most accurate results, however, when such file is not available, pairLiftOver can also operate on contact matrices binned at kilobase resolutions (in cool or hic formats). In this case, pairLiftOver iterates each bin pair (pixel) and converts the midpoint coordinate of each bin to the target assembly. For hic format, since multiple matrices at various resolutions are stored in a single file, pairLiftOver automatically detects and reads data from the one at the highest resolution.
The default output of pairLiftOver is a sorted pairs file in the standard 4DN pairs format, containing seven columns: “readID”, “chr1”, “pos1”, “chr2”, “pos2”, “strand1”, and “strand2”. However, you can also choose to output a matrix file in cool or hic format by setting the parameter --output-format.
Usage
Open a terminal, type pairLiftOver -h for help information.
Running time and memory usage
The running time of pairLiftOver grows linearly with the number of contact pairs. The memory usage can be roughly controlled by the parameter --memory. In the figure below, pairLiftOver was tested on the downsampled GM12878 Hi-C datasets (Rao 2014) (ranging from 100 million to 1 billion valid pairs). For each run, the memory and the number of processes allocated to pairLiftOver were set to 8Gb (--memory 8G) and 8 (--nproc 8), respectively.
Accuracy
So far, pairLiftOver has been tested on datasets of human (Rao 2014, GM12878 and K562), mouse (Rao 2014, CH12-LX) and zebrafish (Yang 2020, brain tissue). And the matrices obtained by pairLiftOver are nearly identical to the re-mapping results at various resolutions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pairLiftOver-0.1.1.tar.gz
.
File metadata
- Download URL: pairLiftOver-0.1.1.tar.gz
- Upload date:
- Size: 27.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.2 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0322f41207cb0c3e4c91e0249fedabbbd6343113abb41205aad25e80c4284a11 |
|
MD5 | bcae3100e6c97c422f394fc6fd7b6998 |
|
BLAKE2b-256 | 041846b57904352a51acd6f71a412dc3455f5f956ab52677d53a411fcb27860b |
File details
Details for the file pairLiftOver-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pairLiftOver-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.2 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7dfd3e18ca41c93064e03fc33693f16f8d0dc52304bccc763181ecc56e15f3b |
|
MD5 | 6fc25e9f35c106d5251c41c5ffd9a36b |
|
BLAKE2b-256 | 28bc1f4013c1587876d1c5896e5886b94b854633bb9b2517e24cac10c987f96a |