HiNT -- HiC for copy number vairations and translocations detection
Project description
HiNT
A computational method for detecting copy number variations and translocations from Hi-C data
Summary
HiNT (Hi-C for copy Number variation and Translocation detection), a computational method to detect CNVs and Translocations from Hi-C data. HiNT has three main components: HiNT-PRE, HiNT-CNV, and HiNT-TL. HiNT-PRE preprocesses Hi-C data and computes the contact matrix, which stores contact frequencies between any two genomic loci; both HiNT-CNV and HiNT-TL starts with HI-C contact matrix, predicts copy number segments, and inter-chromosomal translocations, respectively
Overview of HiNT workflow:
Installation
Dependencies
R and R packages
Python and Python packages
- python >= 3.5
- pyparix >= 0.3.0, cooler >= 0.7.4, pairtools >= 0.2.2, numpy, scipy, pandas, sklearn, multiprocessing
Java and related tools (Optional: required when want to process Hi-C data with juicer tools)
Perl
Other dependencies
- samtools (1.3.1+)
- BIC-seq2 (0.7.3) ! This is optional: if you don't want to run HiNT-CNV, you don't need this package. No need to install, just download BICseq2, unzip it, and give the path where you stored to HiNT.
- bwa (0.7.16+) ! This is optional: required only when your input is fastq
- tabix (0.2.6)
Install HiNT
-
Method1: Install from PyPI using pip.
$ pip install HiNT-Packages
-
Method2: Install using conda (highly recommend)
$ conda install hint
-
Method3: Install manually
- Install HiNT dependencies
- Download HiNT
git clone https://github.com/parklab/HiNT.git
- Go to HiNT directory, install it by
$ python setup.py install
*** Type $ hint
to test if HiNT successfully installed
Download reference files used in HiNT
- Download HiNT references HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Put reference files into the HiNT directory
$ mv hg19/* where_you_put_HiNT/HiNT/HiNT/references/
Quick Start
- Download the test datasets from HERE
HiNT-PRE
HiNT pre: Preprocessing Hi-C data. HiNT pre does alignment, contact matrix creation and normalization in one command line.
$ hint pre -d /path/to/hic_1.fastq.gz,/path/to/hic_2.fastq.gz -i /path/to/bwaIndex --informat fastq --outformat cooler -g hg19 -n test -o /path/to/outputdir --pairsampath /path/to/pairsamtools
see details and more options
$ hint pre -h
HiNT-CNV
HiNT cnv: prediction of copy number information, as well as segmentation from Hi-C.
$ hint cnv -m contactMatrix.mcool -f cooler -r 50 -g hg19 -n test -o /path/to/outputDir
see details and more options
$ hint cnv -h
HiNT-TL
HiNT tl: interchromosomal translocations and breakpoints detection from Hi-C inter-chromosomal interaction matrices.
$ hint tl -m /path/to/data_1Mb.cool,/path/to/data_100kb.cool -c chimericReads.pairsam -f cooler -g hg19 -n test -o /path/to/outputDir
see details and more options
$ hint tl -h
Output of HiNT
HiNT-PRE output
In the HiNT-PRE output directory, you will find
jobname.bam
aligned lossless file in bam formatjobname_merged_valid.pairs.gz
reads pairs in pair formatjobname_chimeric.sorted.pairsam.gz
ambiguous chimeric read pairs used for breakpoint detection in pairsam formatjobname_valid.sorted.deduped.pairsam.gz
valid read pairs used for Hi-C contact matrix creation in pairsam formatjobname.mcool
Hi-C contact matrix in cool formatjobname.hic
Hi-C contact matrix in hic format
HiNT-CNV output
In the HiNT-CNV output directory, you will find
jobname_GAMPoisson.pdf
the GAM regression resultsegmentation/jobname_bicsq_allchroms.txt
CNV segments with log2 copy ratio and p-values in txt filesegmentation/jobname_resolution_CNV_segments.png
figure to visualize CNV segmentssegmentation/jobname_bicseq_allchroms.l2r.pdf
figure to visualize log2 copy ration in each bin (bin size = resolution you set)segmentation/other_files
intermediate files used to run BIC-seqjonname_dataForRegression/*
data used for regression as well as residuals after removing Hi-C biases
HiNT-TL output
In the HiNT-TL output directory, you will find
jobname_Translocation_IntegratedBP.txt
the final integrated translocation breakpointjobname_chrompairs_rankProduct.txt
rank product predicted potential translocated chromosome pairsotherFolders
intermediate files used to identify the translocation breakpoints
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for HiNT_Package-2.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b25ce6981e27ca577225d4bdd1e2f972e2078d6f85849eaffdc8cec71e7952c6 |
|
MD5 | 5fc9337b62d5329f72bb3c66e8e3f773 |
|
BLAKE2b-256 | cd6070d78f833a14528dd724c964efd5e264c236b19fec3ea60893e55dd8b6ab |