HiNT -- HiC for copy number vairations and translocations detection
Project description
HiNT
A computational method for detecting copy number variations and translocations from Hi-C data
Summary
HiNT (Hi-C for copy Number variation and Translocation detection), a computational method to detect CNVs and Translocations from Hi-C data. HiNT has three main components: HiNT-PRE, HiNT-CNV, and HiNT-TL. HiNT-PRE preprocesses Hi-C data and computes the contact matrix, which stores contact frequencies between any two genomic loci; both HiNT-CNV and HiNT-TL starts with HI-C contact matrix, predicts copy number segments, and inter-chromosomal translocations, respectively
Overview of HiNT workflow:
Installation
Dependencies
R and R packages
Python and Python packages
- python >= 3.5
- pyparix >= 0.3.0, cooler >= 0.7.4, pairtools >= 0.2.2, numpy, scipy, pandas, sklearn, multiprocessing
Java and related tools (Optional: required when want to process Hi-C data with juicer tools)
Perl
Other dependencies
- samtools (1.3.1+)
- BIC-seq2 (0.7.3) ! This is optional: if you don't want to run HiNT-CNV, you don't need this package. [Download BICseq2, unzip it, and give the path of BICseq2-seg_v0.7.3 (/path/to/BICseq2-seg_v0.7.3)].
- bwa (0.7.16+) ! This is optional: required only when your input is fastq
- tabix (0.2.6)
Install HiNT
-
Method1: Install using conda (highly recommended)
$ conda install -c su hint
or
$ conda install hint
-
Method2: Install from PyPI using pip.
$ pip install HiNT-Packages
-
Method3: Install manually
- Install HiNT dependencies
- Download HiNT
git clone https://github.com/parklab/HiNT.git
- Go to HiNT directory, install it by
$ python setup.py install
*** Type
$ hint
to test if HiNT successfully installed -
Method 4: Run HiNT in a Docker container (highly recommended)
$ docker pull suwangbio/hint
$ docker run suwangbio/hint hint
See details of the usage on HiNT page at docker hub
Download reference files used in HiNT HERE
- Download HiNT references HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Download HiNT background matrices HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Download BWA index files HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
Quick Start
- Download the test datasets from HERE
HiNT-PRE
HiNT pre: Preprocessing Hi-C data. HiNT pre does alignment, contact matrix creation and normalization in one command line.
$ hint pre -d /path/to/hic_1.fastq.gz,/path/to/hic_2.fastq.gz -i /path/to/bwaIndex/hg19/hg19.fa --refdir /path/to/refData/hg19 --informat fastq --outformat cooler -g hg19 -n test -o /path/to/outputdir --pairtoolspath /path/to/pairtools --samtoolspath /path/to/samtools --coolerpath /path/to/cooler
$ hint pre -d /path/to/test.bam --refdir /path/to/refData/hg19 --informat bam --outformat juicer -g hg19 -n test -o /path/to/outputdir --pairtoolspath /path/to/pairtools --samtoolspath /path/to/samtools --juicerpath /path/to/juicer_tools.1.8.9_jcuda.0.8.jar
use $ which samtools
$ which pairtools
$ which cooler
to get the absolute path of these tools, and /path/to/juicer_tools.1.8.9_jcuda.0.8.jar
should be the path where you store this file
see details and more options
$ hint pre -h
HiNT-CNV
HiNT cnv: prediction of copy number information, as well as segmentation from Hi-C.
$ hint cnv -m contactMatrix.cool -f cooler --refdir /path/to/refDir/hg19 -r 50 -g hg19 -n test -o /path/to/outputDir --bicseq /path/to/BICseq2-seg_v0.7.3 -e MboI
$ hint cnv -m /path/to/4DNFIS6HAUPP.mcool::/resolutions/50000 -f cooler --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII
$ hint cnv -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII
$ hint cnv -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII --doiter
/path/to/BICseq2-seg_v0.7.3
should be the path where you store this package
see details and more options
$ hint cnv -h
HiNT-TL
HiNT tl: interchromosomal translocations and breakpoints detection from Hi-C inter-chromosomal interaction matrices.
$ hint tl -m /path/to/data_1Mb.cool,/path/to/data_100kb.cool --chimeric /path/to/test_chimeric.sorted.pairsam.gz --refdir /path/to/refDir/hg19 --backdir /path/to/backgroundMatrices/hg19 --ppath /path/to/pairix -f cooler -g hg19 -n test -o /path/to/outputDir
$ hint tl -m /path/to/4DNFIS6HAUPP.mcool::/resolutions/1000000,/path/to/4DNFIS6HAUPP.mcool::/resolutions/100000 -f cooler --refdir /path/to/refDir/hg38 --backdir /path/to/backgroundMatrices/hg38 -g hg38 -n 4DNFICSTCJQZ -c 0.05 --ppath /path/to/pairix -p 12
$ hint tl -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refData/hg38 --backdir /path/to/backgroundMatrices/hg38 -g hg38 -n 4DNFICSTCJQZ -c 0.05 --ppath /path/to/pairix -p 12 -o HiNTtransl_juicerOUTPUT
use $ which pairix
to get the absolute path of pairix
see details and more options
$ hint tl -h
Output of HiNT
HiNT-PRE output
In the HiNT-PRE output directory, you will find
jobname.bam
aligned lossless file in bam formatjobname_merged_valid.pairs.gz
reads pairs in pair formatjobname_chimeric.sorted.pairsam.gz
ambiguous chimeric read pairs used for breakpoint detection in pairsam formatjobname_valid.sorted.deduped.pairsam.gz
valid read pairs used for Hi-C contact matrix creation in pairsam formatjobname.mcool
Hi-C contact matrix in cool formatjobname.hic
Hi-C contact matrix in hic format
HiNT-CNV output
In the HiNT-CNV output directory, you will find
jobname_GAMPoisson.pdf
the GAM regression resultsegmentation/jobname_bicsq_allchroms.txt
CNV segments with log2 copy ratio and p-values in txt filesegmentation/jobname_resolution_CNV_segments.png
figure to visualize CNV segmentssegmentation/jobname_bicseq_allchroms.l2r.pdf
figure to visualize log2 copy ration in each bin (bin size = resolution you set)segmentation/other_files
intermediate files used to run BIC-seqjonname_dataForRegression/*
data used for regression as well as residuals after removing Hi-C biases
HiNT-TL output
In the HiNT-TL output directory, you will find
jobname_Translocation_IntegratedBP.txt
the final integrated translocation breakpointjobname_chrompairs_rankProduct.txt
rank product predicted potential translocated chromosome pairsotherFolders
intermediate files used to identify the translocation breakpoints
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file HiNT-Package-2.2.8.tar.gz
.
File metadata
- Download URL: HiNT-Package-2.2.8.tar.gz
- Upload date:
- Size: 49.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
4d5c28fa30d0a31ad1b9d9dc469d70b6b7c6271efebd6fd093943443d667f503
|
|
MD5 |
f95b22f554010fd4f14bbf56c7dab3cf
|
|
BLAKE2b-256 |
ffa1f1d40fad4546763d4df8b02aea3d07454d55a865c418c8163b1b124cbb6b
|
File details
Details for the file HiNT_Package-2.2.8-py3-none-any.whl
.
File metadata
- Download URL: HiNT_Package-2.2.8-py3-none-any.whl
- Upload date:
- Size: 56.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
898402741119c2d27e9ae08de3f1a317f53d52fe7cb34f56197bf3f7819a1c08
|
|
MD5 |
5988ab674e8228938c406d05a7b73ac1
|
|
BLAKE2b-256 |
8091238210843e5e4a784eba8847fbd35178832391062c49f3ac46efa59c342f
|