Spectral translocation detection of HiC matrices.
Project description
HiSTra
Installation
Dependency
conda create -n HiSTra python=3.8
conda activate HiSTra
conda install numpy scipy pandas=1.3.5 matplotlib seaborn h5py
conda install -c conda-forge -c bioconda cooler=0.8.11
pip install matplotlib-venn
Linux OS
pip install HiSTra
Preparation
Download juicer_tool and deDoc. Because of the update of the two softwares, we recommend that you download them from this repo. You can find relevant jar files in the HiSTra/juice and HiSTra/deDoc, respectively.
Make sure chromosome.sizes file is exactly the file used in generating test(and control) sample(.hic or .mcool). Or the error will occer at the early step. Make sure that no underscores('_') is included in the chromosome name.
Directory tree
For bulk HiC data, a recommended work directory looks like:
mkdir work_dir
cd work_dir
mkdir hic_input
# Then move corresponding hic file here.
mkdir TL_output
ln -s deDoc_dir_path .
ln -s juice_dir_path .
The directory tree is:
├── deDoc
│ ├── deDoc.jar
├── hic_input
│ ├── Control_GSE63525_IMR90_combined_30.hic
│ ├── Test_GSE63525_K562_combined_30.hic
│ ├── Control_GSE63525_IMR90_combined_30.mcool
│ └── Test_GSE63525_K562_combined_30.mcool
├── juice
│ ├── juicer_tools_2.09.00.jar
└── TL_output
For scHiC data, a recommand work directory looks like:
├── deDoc
│ ├── deDoc.jar
├── hic_input
│ ├── Control_cells_dir
│ │ ├── cell_1
... │ │ ├── raw
│ │ │ ├── 100000
│ │ │ │ └── *.matrix
│ │ │ └── 500000
│ │ │ └── *.matrix
│ │ └── iced
... ... ├── ...
│ ├── Test_cells_dir
... ...
└── TL_output
Note: For scHiC, the subdiretory of hic_input MUST be cells_dir/normalization/resolution/*.matrix!! Here, normalization could be "raw"/"iced" or any other string you used for path(which is similar to the output format of HiC-Pro), default is "raw". And, resolution is adapted for genome size, e.g. hg19 the resolution should be 100000 and 500000.
Example
Samples
You can download test case from GSE63525. The test sample hicfile is K562 and control sample hicfile is IMR90.
And you can choose the test sample and control sample by yourself.
Resolution
For samples of human, the hicfile should contain 100k and 500k resolution matrix data. In general, the appropriate resolution could be calculated as following:
res_{unit} = 10^{len(max(chromosome_{size}))-4}.
For example, in the hg.sizes the largest size of chromosome is chr1(249250621), the suggested resolution unit would be 100k, and the lower one is defined as:
5 \times res_{unit}.
Command
For bulk HiC data,
# Assume you are in the work_dir,a standard command is for hic format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.hic \
-c hic_input/Control_GSE63525_IMR90_combined_30.hic \
-o TL_output/ \
-d deDoc/deDoc.jar \
-j juice/juicer_tools_2.09.00.jar \
-s sizes/chrom_hg19.sizes
# or mcool format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool \
-c hic_input/Control_GSE63525_IMR90_combined_30.mcool \
-o TL_output/ \
-d deDoc/deDoc.jar \
-s sizes/chrom_hg19.sizes
# or mixed format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool
-c hic_input/Control_GSE63525_IMR90_combined_30.hic \
-o TL_output/ \
-d deDoc/deDoc.jar \
-j juice/juicer_tools_2.09.00.jar \
-s sizes/chrom_hg19.sizes
# Then you can find the result in folder TL_output/SV_result.
For scHiC data,
HiST -t hic_input/Test_cells_dir/ \
-c hic_input/Control_cells_dir \
-s sizes/chrom_hg19.sizes \
-d deDoc/deDoc.jar \
-o TL_output/
Figure
An example of TL result with
FAQ
If you meet "Resource temporarily unavailable" or "error: too many open files" or "ValueError: cannot convert float NaN to integer" or "EmptyDataError: No columns to parse from file"?
If your workstation is configured with more than 128 GB of memory and the number of threads is more than 48, you can try the following operations:
- You can try command
ulimit -u 381152. Here 381152 could be replaced by any big number. - You only need to run HiST command a few more times.
These errors usually occur when the input data is mcool. We will fix them in the next version.
If you meet "No such file or directory"...
- Check matrix_from_hic directory, if no files in sub-directory, check the juicer_log to find error or check cooler package.
- If juicer_log suggest like "invalid chromosome chr12", you should check the sizes file, a common problem is to tell "chr1" or "1".
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file histra-1.4.0.tar.gz.
File metadata
- Download URL: histra-1.4.0.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a5e7ac8b85b40321a17397c4d9e24f2e6119d2b5c3c1c8a3d35b2b9d8c0395e
|
|
| MD5 |
3947b84a4bc4f98e49b91982e2b230f9
|
|
| BLAKE2b-256 |
84ae8867ab6c5ec8dfef490143cbcf45af11fca490d458bc87e62f4ceb5d4939
|
File details
Details for the file HiSTra-1.4.0-py3-none-any.whl.
File metadata
- Download URL: HiSTra-1.4.0-py3-none-any.whl
- Upload date:
- Size: 32.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b825ac290ee2eba31e2595b86395184d035db7bae7b82b6835f8ab3ead34777b
|
|
| MD5 |
cf122d33cd14440083d09c45d6dad7c6
|
|
| BLAKE2b-256 |
594ab440a09cad581f4f36e6d9d1d28f75efeda3aea75280a79a4b14e568b22d
|