Skip to main content

Spectral translocation detection of HiC matrices.

Project description

HiSTra

Installation

Dependency

conda create -n HiSTra python=3.8 
conda activate HiSTra
conda install numpy scipy pandas=1.3.5 matplotlib seaborn h5py
conda install -c conda-forge -c bioconda cooler=0.8.11
pip install matplotlib-venn 

Linux OS

pip install HiSTra

Preparation

Download juicer_tool and deDoc. Because of the update of the two softwares, we recommend that you download them from this repo. You can find relevant jar files in the HiSTra/juice and HiSTra/deDoc, respectively.

Make sure chromosome.sizes file is exactly the file used in generating test(and control) sample(.hic or .mcool). Or the error will occer at the early step. Make sure that no underscores('_') is included in the chromosome name.

Directory tree

A recommended work directory looks like:

mkdir work_dir
cd work_dir
mkdir hic_input
# Then move corresponding hic file here.
mkdir TL_output
ln -s deDoc_dir_path .
ln -s juice_dir_path .

Finally, the directory tree is:

├── deDoc
│   ├── deDoc.jar
├── hic_input
│   ├── Control_GSE63525_IMR90_combined_30.hic
│   ├── Test_GSE63525_K562_combined_30.hic
│   ├── Control_GSE63525_IMR90_combined_30.mcool
│   └── Test_GSE63525_K562_combined_30.mcool
├── juice
│   ├── juicer_tools_2.09.00.jar
└── TL_output

Example

Samples

You can download test case from GSE63525. The test sample hicfile is K562 and control sample hicfile is IMR90.

And you can choose the test sample and control sample by yourself.

Resolution

For samples of human, the hicfile should contain 100k and 500k resolution matrix data. In general, the appropriate resolution could be calculated as following:

res_{unit} = 10^{len(max(chromosome_{size}))-4}.

For example, in the hg.sizes the largest size of chromosome is chr1(249250621), the suggested resolution unit would be 100k, and the lower one is defined as:

5 \times res_{unit}.

Command

# Assume you are in the work_dir,a standard command is for hic format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.hic -c hic_input/Control_GSE63525_IMR90_combined_30.hic -o TL_output/ -d deDoc/deDoc.jar -j juice/juicer_tools_2.09.00.jar -s sizes/chrom_hg19.sizes
# or mcool format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool -c hic_input/Control_GSE63525_IMR90_combined_30.mcool -o TL_output/ -d deDoc/deDoc.jar -s sizes/chrom_hg19.sizes
# or mixed format file
HiST -t hic_input/Test_GSE63525_K562_combined_30.mcool -c hic_input/Control_GSE63525_IMR90_combined_30.hic -o TL_output/ -d deDoc/deDoc.jar -j juice/juicer_tools_2.09.00.jar -s sizes/chrom_hg19.sizes
# Then you can find the result in folder TL_output/SV_result.

Figure

An example of TL result with heatmap

FAQ

If you meet "Resource temporarily unavailable" or "error: too many open files" or "ValueError: cannot convert float NaN to integer" or "EmptyDataError: No columns to parse from file"?

If your workstation is configured with more than 128 GB of memory and the number of threads is more than 48, you can try the following operations:

  1. You can try command ulimit -u 381152 . Here 381152 could be replaced by any big number.
  2. You only need to run HiST command a few more times.

These errors usually occur when the input data is mcool. We will fix them in the next version.

If you meet "No such file or directory"...
  1. Check matrix_from_hic directory, if no files in sub-directory, check the juicer_log to find error or check cooler package.
  2. If juicer_log suggest like "invalid chromosome chr12", you should check the sizes file, a common problem is to tell "chr1" or "1".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HiSTra-1.3.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

HiSTra-1.3-py3-none-any.whl (32.8 MB view details)

Uploaded Python 3

File details

Details for the file HiSTra-1.3.tar.gz.

File metadata

  • Download URL: HiSTra-1.3.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for HiSTra-1.3.tar.gz
Algorithm Hash digest
SHA256 de5f484ea6b59c49610fc818d60fd3deb0703934bce0d4368d0ffd69e1c7de0f
MD5 49fafa374aca41869d8cab549daed0fe
BLAKE2b-256 386a5401c12b0cdd1c063124f1c9ac87a7192517611ae80df42e8ec67fdb8eee

See more details on using hashes here.

File details

Details for the file HiSTra-1.3-py3-none-any.whl.

File metadata

  • Download URL: HiSTra-1.3-py3-none-any.whl
  • Upload date:
  • Size: 32.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for HiSTra-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 21e59b5b620e20d544f4ebc20be917d2a928e15059c4ebcea6cc4619a1d7ce3f
MD5 cc33e18ef3c852a8cca4129f5b3994ab
BLAKE2b-256 8b622a536bd937b91f552d178eda1210cb02b270ccbff4727a4c6f008f72f734

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page