Compute similarity between genomic contact matrices with "Entropy 3C"
Project description
ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy1 and recent work on entropy quantification of Pearson correlation matrices2. For a contact matrix, ENT3C records the change in local pattern complexity of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.
https://github.com/X3N1A/ENT3C
Installation
-
generate and activate python environment
python3.11 -m venv .ent3c_venv source .ent3c_venv/bin/activate -
install ENT3C:
pip install ENT3C
Usage
-
CLI (python) usage:
Usage: ENT3C <command> --config=<path/to/config.json> [options] Commands: get_entropy Generates entropy output file <entropy_out_FN> . get_similarity Generates similarity output file <similarity_out_FN> from <entropy_out_FN>. run_all Generates <entropy_out_FN> and <similarity_out_FN>. compare_groups Compare signal groups (requires --group1 and --group2 options) Global Options: --config=<path> Path to config JSON file (required for all commands) <compare_groups> Options: --group1=<GROUP> First group name, must correspond to what comes before _BR* in config file. --group2=<GROUP> Second group name, must correspond to what comes before _BR* in config file. Examples: ENT3C run_all --config=configs/myconfig.json ENT3C get_entropy --config=configs/myconfig.json ENT3C get_similarity --config=configs/myconfig.json ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562 -
alternatively run ENT3C in python as:
import ENT3C ENT3C_OUT = ENT3C.run_get_entropy("config/myconfig.json") Similarity = ENT3C.run_get_similarity("config/myconfig.json") ENT3C_OUT, Similarity = ENT3C.run_all("config/myconfig.json") EUCLIDEAN = ENT3C.run_compare_groups("config/myconfig.json",group1,group2) -
all ENT3C parameters are defined in .json files
config/config.json. Examples can be found inconfigdirectory. -
Paremeters defined in <config_file>:
-
The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices
SUB_M_SIZE_FIX.-
"SUB_M_SIZE_FIX": <integer>$\dots$ fixed submatrix dimension.SUB_M_SIZE_FIXcan be either be fixed by or alternatively, one can specifyCHRSPLIT; in this caseSUB_M_SIZE_FIXwill be computed internally to fit the number of desired times the contact matrix is to be paritioned into.
PHI=1+floor((N-SUB_M_SIZE)./phi)where
Nis the size of the input contact matrix,phiis the window shift,PHIis the number of evaluated submatrices (consequently the number of data points in $S$). -
"CHRSPLIT": <integer>$\dots$ number of submatrices into which the contact matrix is partitioned into. If specified, then"SUB_M_SIZE_FIX": nullotherwise"CHRSPLIT": null.
-
-
"DATA_PATH": </path/to/data>$\dots$ input data path. -
input files in format:
[<COOL_FILENAME>, <SHORT_NAME>]"FILES": [ "ENCSR079VIJ.BioRep1.40kb.cool", "G401_BR1", "ENCSR079VIJ.BioRep2.40kb.cool", "G401_BR2"]-
Any biological replicates must be indicated in <SHORT_NAME> using the suffix "_BR%d".
-
Note: ENT3C also takes
mcoolfiles as input.
-
-
"`OUT_DIR": "<desired_output_directory_name>"$\dots$ output directory.OUT_DIRwill be concatenated withOUTPUT/JULIA/orOUTPUT/MATLAB/. -
"OUT_PREFIX": "<desired_output_prefix_>"$\dots$ prefix for output files. -
"Resolution": "<integer,integer,...>" e.g. "40e3,100e3"$\dots$ resolutions to be evaluated. -
"ChrNr": "<integer,integer,...>" "15,16,17,18,19,20,21,22,X"$\dots$ chromosome numbers to be evaluated. -
"NormM": <0|1>$\dots$ input contact matrices can be balanced. IfNormM: 1, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset/resolutions/<resolution>/bins/<WEIGHTS_NAME>. -
"WEIGHTS_NAME": "<name_of_weights>"$\dots$ name of dataset in cooler containing normalization weights. -
"phi": <integer>$\dots$ number of bins to the next matrix. -
"PHI_MAX": <integer>$\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. If set, $\varphi$ is increased until $\Phi \approx \Phi_{\max}$.
-
Output files:
-
<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv$\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified inFILESand the third columnQthe corresponding similarity score.Resolution ChrNr Sample1 Sample2 Q 40000 2 HFFc6_BR3 A549_BR2 0.6132789056404898 40000 2 HFFc6_BR3 LNCap_BR2 0.3126805134567409 40000 2 HFFc6_BR3 LNCap_BR1 0.4221187669214683 40000 2 HFFc6_BR3 HFFc6_BR2 0.9632461160758761 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -
<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv$\dots$ ENT3C output table.Name ChrNr Resolution n PHI phi binNrStart binNrEND START END S G401_BR1 2 40000 500 918 6 0 499 0 20000000 3.7896426915562462 G401_BR1 2 40000 500 918 6 6 505 240000 20240000 3.789044181663418 G401_BR1 2 40000 500 918 6 12 511 480000 20480000 3.7918253959272032 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Each row corresponds to an evaluated submatrix with fields
Name(the short name specified inFILES),ChrNr,Resolution, the sub-matrix dimensionsub_m_dim,PHI=1+floor((N-SUB_M_SIZE)./phi),binNrStartandbinNrEndcorrespond to the start and end bin of the submatrix,STARTandENDare the corresponding genomic coordinates andSis the computed von Neumann entropy.- Example of output generated for
ENT3C get_entropy --config=config/myconfig.json:EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf- unbalanced 40kb contact matrices for even chromosomes across 5 cell lines.
SUB_MATRIX_SIZEwas 500:
- Example of output generated for
-
<OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv$\dots$ Euclidean distance between average z-scores of S over<group1>and<group2>: (here group1=HFFc6, group2=G401)Resolution ChrNr START END meanS_Euclidean 40000 6 62360000 82360000 3.3625023926723685 40000 6 62120000 82120000 3.3546076641065095 40000 6 61880000 81880000 3.3441925121710026- Example of first page of output generated for
ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = "G401"EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf
- Example of first page of output generated for
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ent3c-2.2.2.tar.gz.
File metadata
- Download URL: ent3c-2.2.2.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0d597d79e89d7c6d8dd473b204dd2ed5e9e77a724bfdc9cdaf3397f1f16d95a
|
|
| MD5 |
6ca819988b1bed838ff3fad3f55ee573
|
|
| BLAKE2b-256 |
feabc1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a
|
File details
Details for the file ent3c-2.2.2-py3-none-any.whl.
File metadata
- Download URL: ent3c-2.2.2-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
260dbc84edd0eb5cb3e16872eb594ae493f04bbc1a1e01ceb108e5621737aca1
|
|
| MD5 |
998ae1bc83f4c9543c49e333401b6e7a
|
|
| BLAKE2b-256 |
23f6fdccfeb2885a67d528f9609c49777aa3c95bf9abda8f90a80f9c54a70777
|