Skip to main content

HiNT -- HiC for copy number vairations and translocations detection

Project description

HiNT

A computational method for detecting copy number variations and translocations from Hi-C data

Summary

HiNT (Hi-C for copy Number variation and Translocation detection), a computational method to detect CNVs and Translocations from Hi-C data. HiNT has three main components: HiNT-PRE, HiNT-CNV, and HiNT-TL. HiNT-PRE preprocesses Hi-C data and computes the contact matrix, which stores contact frequencies between any two genomic loci; both HiNT-CNV and HiNT-TL starts with HI-C contact matrix, predicts copy number segments, and inter-chromosomal translocations, respectively

Overview of HiNT workflow:

Installation

Dependencies

R and R packages

  1. R >= 3.4
  2. mgcv, strucchange, doParallel, Cairo, foreach

Python and Python packages

  1. python >= 3.5
  2. pyparix >= 0.3.0, cooler >= 0.7.4, pairtools >= 0.2.2, numpy, scipy, pandas, sklearn, multiprocessing

Java and related tools (Optional: required when want to process Hi-C data with juicer tools)

  1. Java (version >= 1.7)
  2. Juicer tools (1.8.9 is recommended)

Perl

  1. Perl (version >= 5)

Other dependencies

  1. samtools (1.3.1+)
  2. BIC-seq2 (0.7.3) ! This is optional: if you don't want to run HiNT-CNV, you don't need this package. No need to install, just download BICseq2, unzip it, and give the path where you stored to HiNT.
  3. bwa (0.7.16+) ! This is optional: required only when your input is fastq
  4. tabix (0.2.6)

Install HiNT

  • Method1: Install from PyPI using pip.

    $ pip install HiNT-Packages

  • Method2: Install using conda (highly recommend)

    $ conda install hint

  • Method3: Install manually

    1. Install HiNT dependencies
    2. Download HiNT git clone https://github.com/parklab/HiNT.git
    3. Go to HiNT directory, install it by $ python setup.py install

*** Type $ hint to test if HiNT successfully installed

Download reference files used in HiNT

  1. Download HiNT references HERE. Only hg19, hg38 and mm10 are available currently. Unzip it $ unzip hg19.zip
  2. Put reference files into the HiNT directory $ mv hg19/* where_you_put_HiNT/HiNT/HiNT/references/

Quick Start

HiNT-PRE

HiNT pre: Preprocessing Hi-C data. HiNT pre does alignment, contact matrix creation and normalization in one command line.

$ hint pre -d /path/to/hic_1.fastq.gz,/path/to/hic_2.fastq.gz -i /path/to/bwaIndex --informat fastq --outformat cooler -g hg19 -n test -o /path/to/outputdir --pairsampath /path/to/pairsamtools

see details and more options

$ hint pre -h

HiNT-CNV

HiNT cnv: prediction of copy number information, as well as segmentation from Hi-C.

$ hint cnv -m contactMatrix.mcool -f cooler -r 50 -g hg19 -n test -o /path/to/outputDir

see details and more options

$ hint cnv -h

HiNT-TL

HiNT transl: interchromosomal translocations and breakpoints detection from Hi-C inter-chromosomal interaction matrices.

$ hint transl -m /path/to/data_1Mb.cool,/path/to/data_100kb.cool -c chimericReads.pairsam -f cooler -g hg19 -n test -o /path/to/outputDir

see details and more options

$ hint transl -h

Output of HiNT

HiNT-PRE output

In the HiNT-PRE output directory, you will find

  1. jobname.bam aligned lossless file in bam format
  2. jobname_merged_valid.pairs.gz reads pairs in pair format
  3. jobname_chimeric.sorted.pairsam.gz ambiguous chimeric read pairs used for breakpoint detection in pairsam format
  4. jobname_valid.sorted.deduped.pairsam.gz valid read pairs used for Hi-C contact matrix creation in pairsam format
  5. jobname.mcool Hi-C contact matrix in cool format
  6. jobname.hic Hi-C contact matrix in hic format

HiNT-CNV output

In the HiNT-CNV output directory, you will find

  1. jobname_GAMPoisson.pdf the GAM regression result
  2. segmentation/jobname_bicsq_allchroms.txt CNV segments with log2 copy ratio and p-values in txt file
  3. segmentation/jobname_resolution_CNV_segments.png figure to visualize CNV segments
  4. segmentation/jobname_bicseq_allchroms.l2r.pdf figure to visualize log2 copy ration in each bin (bin size = resolution you set)
  5. segmentation/other_files intermediate files used to run BIC-seq
  6. jonname_dataForRegression/* data used for regression as well as residuals after removing Hi-C biases

HiNT-TL output

In the HiNT-TL output directory, you will find

  1. jobname_Translocation_IntegratedBP.txt the final integrated translocation breakpoint
  2. jobname_chrompairs_rankProduct.txt rank product predicted potential translocated chromosome pairs
  3. otherFolders intermediate files used to identify the translocation breakpoints

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HiNT-Package-2.0.7.tar.gz (47.5 kB view hashes)

Uploaded Source

Built Distribution

HiNT_Package-2.0.7-py3-none-any.whl (59.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page