Skip to main content

Spectral translocation detection of HiC matrices.

Project description

HiSTra

Installation

Dependency

conda create -n HiSTra python=3.8 
conda activate HiSTra
conda install numpy scipy pandas=1.3.5 matplotlib seaborn h5py
conda install -c conda-forge -c bioconda cooler=0.8.11
pip install matplotlib-venn 

Linux OS

pip install HiSTra

Preparation

Download juicer_tool and deDoc. Because of the update of the two softwares, we recommend that you download them from this repo. You can find relevant jar files in the HiSTra/juice and HiSTra/deDoc, respectively.

Make sure chromosome.sizes file is exactly the file used in generating test(and control) sample(.hic or .mcool). Or the error will occer at the early step. Make sure that no underscores('_') is included in the chromosome name.

Directory tree

For bulk HiC data, a recommended work directory looks like:

mkdir work_dir
cd work_dir
mkdir hic_input
# Then move corresponding hic file here.
mkdir TL_output
ln -s deDoc_dir_path .
ln -s juice_dir_path .

The directory tree is:

├── deDoc
│   ├── deDoc.jar
├── hic_input
│   ├── Control_GSE63525_IMR90_combined_30.hic
│   ├── Test_GSE63525_K562_combined_30.hic
│   ├── Control_GSE63525_IMR90_combined_30.mcool
│   └── Test_GSE63525_K562_combined_30.mcool
├── juice
│   ├── juicer_tools_2.09.00.jar
└── TL_output

For scHiC data, a recommand work directory looks like:

├── deDoc
│   ├── deDoc.jar
├── hic_input
│   ├── Control_cells_dir
│      ├── cell_1
...       ├── raw
             ├── 100000
                └── *.matrix
             └── 500000
                 └── *.matrix
          └── iced
...    ...      ├── ...

│   ├── Test_cells_dir
...    ...
└── TL_output

Note: For scHiC, the subdiretory of hic_input MUST be cells_dir/normalization/resolution/*.matrix!! Here, normalization could be "raw"/"iced" or any other string you used for path(which is similar to the output format of HiC-Pro), default is "raw". And, resolution is adapted for genome size, e.g. hg19 the resolution should be 100000 and 500000.

Example

Samples

You can download test case from GSE63525. The test sample hicfile is K562 and control sample hicfile is IMR90.

And you can choose the test sample and control sample by yourself.

Resolution

For samples of human, the hicfile should contain 100k and 500k resolution matrix data. In general, the appropriate resolution could be calculated as following:

res_{unit} = 10^{len(max(chromosome_{size}))-4}.

For example, in the hg.sizes the largest size of chromosome is chr1(249250621), the suggested resolution unit would be 100k, and the lower one is defined as:

5 \times res_{unit}.

Command

For bulk HiC data,

# Assume you are in the work_dir,a standard command is for hic format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.hic \
-c hic_input/Control_GSE63525_IMR90_combined_30.hic \
-o TL_output/ \
-d deDoc/deDoc.jar \
-j juice/juicer_tools_2.09.00.jar \
-s sizes/chrom_hg19.sizes 
# or mcool format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool \
-c hic_input/Control_GSE63525_IMR90_combined_30.mcool \
-o TL_output/ \
-d deDoc/deDoc.jar \
-s sizes/chrom_hg19.sizes
# or mixed format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool 
-c hic_input/Control_GSE63525_IMR90_combined_30.hic \
-o TL_output/ \
-d deDoc/deDoc.jar \
-j juice/juicer_tools_2.09.00.jar \
-s sizes/chrom_hg19.sizes
# Then you can find the result in folder TL_output/SV_result.

For scHiC data,

HiST -t hic_input/Test_cells_dir/ \
-c hic_input/Control_cells_dir  \
-s sizes/chrom_hg19.sizes \
-d deDoc/deDoc.jar \
-o TL_output/

Figure

An example of TL result with heatmap

FAQ

If you meet "Resource temporarily unavailable" or "error: too many open files" or "ValueError: cannot convert float NaN to integer" or "EmptyDataError: No columns to parse from file"?

If your workstation is configured with more than 128 GB of memory and the number of threads is more than 48, you can try the following operations:

  1. You can try command ulimit -u 381152 . Here 381152 could be replaced by any big number.
  2. You only need to run HiST command a few more times.

These errors usually occur when the input data is mcool. We will fix them in the next version.

If you meet "No such file or directory"...
  1. Check matrix_from_hic directory, if no files in sub-directory, check the juicer_log to find error or check cooler package.
  2. If juicer_log suggest like "invalid chromosome chr12", you should check the sizes file, a common problem is to tell "chr1" or "1".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histra-1.4.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

HiSTra-1.4.0-py3-none-any.whl (32.8 MB view details)

Uploaded Python 3

File details

Details for the file histra-1.4.0.tar.gz.

File metadata

  • Download URL: histra-1.4.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for histra-1.4.0.tar.gz
Algorithm Hash digest
SHA256 3a5e7ac8b85b40321a17397c4d9e24f2e6119d2b5c3c1c8a3d35b2b9d8c0395e
MD5 3947b84a4bc4f98e49b91982e2b230f9
BLAKE2b-256 84ae8867ab6c5ec8dfef490143cbcf45af11fca490d458bc87e62f4ceb5d4939

See more details on using hashes here.

File details

Details for the file HiSTra-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: HiSTra-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 32.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for HiSTra-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b825ac290ee2eba31e2595b86395184d035db7bae7b82b6835f8ab3ead34777b
MD5 cf122d33cd14440083d09c45d6dad7c6
BLAKE2b-256 594ab440a09cad581f4f36e6d9d1d28f75efeda3aea75280a79a4b14e568b22d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page