Comprehensive Network Analysis for HiC
Project description
Latest updated on 07/09/2022,
Comprehensive Network Analysis for HiC
Overview
This module is a Python package containing tool for network analysis of differential interactions of Hic data.
Hardware Requirements
This package requires only a standard computer with enough RAM to support the in-memory operations.
Software Requirements
We strong recommand you to install Anaconda before install HicHub. HicHub mainly depends on the Python scientific stack.
python >=3
pandas = 1.4.3
numpy = 1.23.0
bedtool >= 2.70.1
pybedtools = 0.9.0
python-igraph = 0.9.11
scipy = 1.8.1
hic-straw = 1.3.1
statsmodels = 0.13.2
pycairo >= 1.11.0
Installation Guide
Before install HicHub, make sure you install following packages:
pip install hic-straw
sudo apt-get install bedtools
pip install pybedtools
pip install pycairo
pip install scipy
Quick installation. Please type the following command in your Linux shell to install HicHub.
python -m pip install git+https://github.com/WeiqunPengLab/HiCHub
If installing successfully, type
hichub
in your Linux shell, then you will see the following interface:
welcome
The python env is: [your python version] (main, [time you type this command])
[GCC 7.5.0]
usage: hichub [-h] {convert, diff, asso, plot} ...
hichub -- A toolset to detect and analyze differential Hubs.
positional arguments:
{convert, diff, asso, plot} sub-command help
convert Convert multi .hic to .txt format.
diff Parser for call hubs
asso Parser for associate clusters with promoter
plot Parser for plot clusters in igraph package.
optional arguments:
-h, --help show this help message and exit
Example of Runing
In order to call HicHubs, you first need to prepare two (.hic) files and put them in the same directory.
In this Github, under the directory '~/test' there are two (.hic) files named 'H1ESC.hic' and 'HFFc6.hic'.
Please download them for test.
ATTENTION: You need to run all these processes in the same directory, once you finished one step, please don't change the name of output files!
'convert'
First, you need to use 'convert' command to convert them to (.txt) format that we will use,
hichub convert -i [run path] -f [file names, seperate with ','] -l [label of output, seperate with ','] -r [resolution of bin]
-i : Your input path, the directory that you strore your two (.hic) files, the directory that you run the HicHub program.
-f : Your input file names, seperate them with comma. For example '-f H1ESC.hic,HFFc6.hic'.
-l : Your output files' labels, name your two input files with the same order, seperate them with comma. For example '-l H1ESC,HFFc6'.
-r : The length for one bin on genome , the unit is 'bp'. For example '-r 10000'.
For example: (The test data may takes you around 2 mins to finish.)
hichub convert -i /mnt/d/test -f H1ESC.hic,HFFc6.hic -l H1ESC,HFFc6 -r 10000
The output is a (.txt) format files which contains the contact matrics of two (.hic) files in the format:
#chr bin1 bin2 label1 label2
(The empty between two labels is tab.)
For example (the output of test data):
#chr bin1 bin2 H1ESC HFFc6
Where, '#chr', 'bin1', 'bin2' represent the chromosome, the location of the left anchor and the location of the right anchor respectivly.
'diff'
When you converted two (.hic) files into the (.txt) format by command 'convert', you can use command 'diff' to call hubs.
hichub diff -i [yout (.txt) file's name] -l [label you have used before] -r [resolution of bin] -c [optional, cut-off threshold] -d [optional, folde change threshold] -p [optional, p-value threshold]
-i : Your converted (.txt) file's name from the 'convert' step. For example: '-i Summary_H1ESC_HFFc6_Dense_Matrix.txt'
-l : The labe you have used in the 'convert' step. For example: '-l H1ESC,HFFc6'
-r : The resolution you have used in the 'convert' step. For example: '-r 10000'
-c : Optional default = 10, remove the sum of two values of contact matric samller than a threshold. For example: '-c 10'.
-d : Optional default = 1.0, the threshold to determine whether we keep an edge in cluster analysis, the details could be seen in the paper.
-p : Optional default = 0.00001, The threshold to pick hubs smaller than a certain p-value.
For example: (The test data may takes you around 4 mins to finish.)
hichub diff -i Summary_H1ESC_HFFc6_Dense_Matrix.txt -l H1ESC,HFFc6 -r 10000
hichub diff -i Summary_H1ESC_HFFc6_Dense_Matrix.txt -l H1ESC,HFFc6 -r 10000 -c 10 -d 1 -p 0.00001
There are four output files.
(1) --- 'H1ESC_specific_hubs.bed'
(2) --- 'HFFc6_specific_hubs.bed'
(3) --- 'cluster_H1ESC.txt'
(4) --- 'cluster_HFFc6.txt'
(1) and (2) contain the cell-type-specific hubs we found. The format of output file (hubs) is:
left_hub_anchor right_hub_anchor -log10(pvalue)
(The empty between two labels is tab.)
(3) and (4) record the cluster information we used to call hubs, they are use for drawing the network plot in the following functions.
'asso'
This function associate genome annotation (gene, open chromatin) with network clusters.
In the '~/test' folder, there are also two test files named 'promoter.bed' and 'DNase.bed' containing the information of gene promoter and DNase for the test data.
hichub asso -i [run path] -l [label you have used before] -p [the files contain gene promoter] -f [Optional, file name for DNase, CTCF, ...]
-i : Your input path,the directory that you run the HicHub program. For example: '-i /mnt/d/test'
-l : The labe you have used in the 'convert' and 'diff' step. For example: '-l H1ESC,HFFc6'
-p : The name of the file that contains gene promoter's information. It contains the coordinate and name of the genes you want to analize and plot in the future. The input format should be :
#chr start end gene_name
(The empty between two labels is tab.)
-f : Optional. The name of the file that contains information about another factor, such as DNase, CTCF .... The input format shows as following, where signal represents if your factor's count is increasing or decreasing between two situation, if in situation label1 larger than label2, please mark them as 'up', if in situation label2 larger than label1, please mark them as 'down', if not change, please mark them as number 0. The label1 and label2 represent the counts of your factor in the given regions.
#chr start end label1 label2 signal
(The empty between two labels is tab.)
For example: (The test data may takes you around 1 mins to finish.)
hichub asso -i /mnt/d/test -l H1ESC,HFFc6 -p promoter.bed
hichub asso -i /mnt/d/test -l H1ESC,HFFc6 -p promoter.bed -f DNase.bed
The ouput is the files with name "cluster_annotated_H1ESC.txt" and "cluster_annotated_H1ESC.txt". They associate the gene with the cluster nodes.
'plot'
This function plots the networks associated with one or more particular gene(s) along with the annotation information.
hichub plot -i [yout (.txt) file's name] -l [label you have used before] -p [the files contain gene promoter] -n [gene names, seperate with ','] -c [optional, cut-off threshold you have used in 'diff'] -d [optional, folde change threshold you have used in 'diff']
-i : Your converted (.txt) file's name from the 'convert' step. For example: '-i Summary_H1ESC_HFFc6_Dense_Matrix.txt'
-l : The labe you have used in the 'convert', 'diff', 'asso' steps. For example: '-l H1ESC,HFFc6'
-p : The file name that contain gene promoter's information.
The input format should be : #chr----start----end----gene_name
-n : The gene names you want to plot. For example: '-n CPTC,DFFB'
-c : Optional default = 10. The -c value you have used in 'diff'.
-d : Optional default = 1.0. The -d value you have used in 'diff'.
The result will contain the network plot for gene you entered.
If this gene is not in any hub, you will received the hint that : 'The 'gene_name' does not exist in any hubs.'
For example: (The test data may takes you around 3 mins to finish.)
hichub plot -i Summary_H1ESC_HFFc6_Dense_Matrix.txt -l H1ESC,HFFc6 -p promoter.bed -n CPTP
It will plot the network around gene CPTP.
## Versioning1.0.0
Authors
- Xiang Li, Shuang Yuan, Shaoqi Zhu
License
#This project is licensed under the MIT License - see the LICENSE.md file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hichub-1.2.0.tar.gz
.
File metadata
- Download URL: hichub-1.2.0.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ce3be5849dd306f112300080fd2082fe6c8a43a55e9f3571af49671f7f829f7 |
|
MD5 | bf433a60dd864e3b0281d353c35e71f0 |
|
BLAKE2b-256 | 81d3ce86de71cde1238272dced07d20629f84b0f6e043ecafae4b23498c0128c |
File details
Details for the file hichub-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: hichub-1.2.0-py3-none-any.whl
- Upload date:
- Size: 30.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 700eaaae53eaa8af9960f1d8c123b62633c3adf9ecdd49e63a45d6e99f6cc8f4 |
|
MD5 | f03588563c17cdade20978a80755448c |
|
BLAKE2b-256 | 84132d2e4d2e7e2666e7e29f2babf050195f8ac9b8778fc4848b4d686796cc29 |