Skip to main content

Polymer dynamics simulation from Hi-C data

Project description

PHi-C2

PHi-C2 allows for a physical interpretation of a Hi-C contact matrix. The phic package includes a suite of command line tools.

Installation (with Conda environment)

You can install phic in a clean environment as follows:

conda create -n phic python=3.12
conda activate phic
pip install phic

PyPI Downloads PyPI Downloads PyPI Downloads

Without preparing a Python environment, PHi-C2 (=<2.0.13) runs on Google Colab.

Requirements

  • PHi-C2 is based on python3.
  • Python packages numpy, matplotlib, scipy, click, pandas, hic-straw, cooler, h5py, MDAnalysis, tqdm, psutil, hictkpy.

To visualize the simulated polymer dynamics and conformations, VMD can be used. Alternatively, the output PSF and DCD files can be viewed directly in a web browser using Mol* without any local installation.

Citation

If you use PHi-C2, please cite:

Soya Shinkai, Hiroya Itoga, Koji Kyoda, and Shuichi Onami. (2022). PHi-C2: interpreting Hi-C data as the dynamic 3D genome state. Bioinformatics 38(21) 4984–4986.

Quick Start

After the installation of phic and downloading of the directory demo, move to the directory demo:

demo/
  run.sh

Then, execute the following script:

./run.sh

This process may take a few minutes.

The demo uses Hi-C data of mouse embryonic stem cells (chr2: 40–65 Mb, 25-kb resolution, KR normalization) by Bonev et al..


Usage

phic needs a subcommand on the command line interface:

phic SUBCOMMAND [OPTIONS]

Subcommands:
fetch-fileinfo
      ↓
preprocessing
      ↓
optimization
      ├──> plot-optimization
      ├──> dynamics
      ├──> sampling
      ├──> msd
      │    └──> plot-msd
      └──> losstangent
           └──> plot-losstangent

As of version 2.2.1, most subcommands accept experimental --json / --json-path / --run-uuid options that append a structured analysis log to phic.json in the workspace. These options are provided for internal pipeline integration; the schema and detailed usage will be documented in a future release.

0. fetch-fileinfo

phic fetch-fileinfo [OPTIONS]

Options:
  --input               TEXT     Input Hi-C file (.hic or .mcool format)  [required]

The fetch-fileinfo subcommand is used to inspect the basic metadata of a Hi-C data file. As of version 2.1.1, phic supports both .hic and .mcool formats as input.

Use this command to check available chromosomes, resolution levels, and indexing details in the input file before proceeding with further analysis. This ensures that downstream subcommands reference the correct chromosome names and binning resolutions.

This is a recommended first step when working with new input files.

Example:

phic fetch-fileinfo --input FILENAME.hic

1. preprocessing

phic preprocessing [OPTIONS]

Options:
  --input               TEXT     Input Hi-C file (.hic or .mcool format)  [required]
  --res                 INTEGER  Resolution of the bin size  [required]
  --plt-max-c           FLOAT    Maximum value of contact map  [required]
  --for-high-resolution FLAG     Normalization of contact map for high-resolution case (ex. 1-kb, 500-bp, 200-bp)  [default=False]
  --chr                 TEXT     Target chromosome  [required]
  --grs                 INTEGER  Start position of the target genomic region
  --gre                 INTEGER  End position of the target genomic region
  --norm                TEXT     Type of normalization to apply
  --tolerance           FLOAT    Threshold used to remove segments containing NaN values  [required]
  --help                         Show this message and exit.

In version 2.1.1 and later, the input data format has been changed to .hic or .mcool. Additionally, it is now possible to exclude rows and columns containing NaN values from the analysis by specifying their allowed proportion (ranging from 0 to 1) using the tolerance parameter.

When using the preprocessing subcommand, a directory will be automatically created based on the input Hi-C file name, chromosome number, genomic region of interest (optional), resolution, and normalization method. All subsequent analysis results will be stored in this directory. In the following explanations, we refer to this directory as NAME.

The outputs are as follows:

NAME/
├── C_normalized.npz
├── C_normalized.svg
├── P_normalized.npz
├── P_normalized.svg
└── _meta_data/

The two .npz files can be loaded with numpy.load using the keys below:

File Key Shape Description
C_normalized.npz C_normalized (N, N) Normalized contact matrix (diagonal = 1; NaN-marked rows/columns are preserved)
P_normalized.npz P_normalized (N, 2) Column 0: genomic distance [bp]; column 1: averaged normalized contact probability

Example:

phic preprocessing --input FILENAME.hic --res 25000 --plt-max-c 0.05 --chr 2 --grs 40000000 --gre 65000000 --norm KR --tolerance 0.4
phic preprocessing --input FILENAME.hic --res 100000 --plt-max-c 0.05 --chr 2 --norm KR --tolerance 0.8

2. optimization

phic optimization [OPTIONS]

Options:
  --name                      TEXT   Target directory name  [required]
  --init-k-backbone           FLOAT  Initial parameter of K_i,i+1  [default=0.5]
  --stop-condition-parameter  FLOAT  Parameter for the stop condition  [default=1e-7]
  --backtracking-factor       FLOAT  Backtracking factor  [default=0.7]
  --gradient-degree           INT    Gradient used for optimizing of K  [default=2]
  --help                             Show this message and exit.

The outputs are the followings:

NAME/
└── data_optimization/
    ├── K_optimized.npz
    ├── C_optimized.npz
    ├── P_optimized.npz
    └── optimization.log

As of version 2.2.1, optimization also produces C_optimized.npz and P_optimized.npz (previously generated by plot-optimization), so the downstream analyses no longer require running plot-optimization first. The three .npz files can be loaded with numpy.load using the keys below:

File Key Shape Description
K_optimized.npz K_optimized (N, N) Optimized polymer network interaction matrix
C_optimized.npz C_optimized (N, N) Contact matrix reconstructed from the optimized K (NaN positions of C_normalized are preserved)
P_optimized.npz P_optimized (N, 2) Column 0: genomic distance [bp]; column 1: averaged contact probability from C_optimized

optimization.log is a tab-separated text file that records the per-iteration trajectory of the optimization. It has a header row followed by one row per accepted step:

Column Description
step Iteration index (0 = initial state before any update)
cost Frobenius norm of the difference between the reconstructed contact matrix and C_normalized divided by N; equivalent to the root-mean-square deviation (RMSD) per matrix element
eta Learning rate accepted at this step (adjusted by backtracking)

It is consumed by plot-optimization to render Cost.svg and Eta.svg, and can also be loaded directly with numpy.loadtxt(path, delimiter="\t", skiprows=1) for custom analysis.

Example:

phic optimization --name NAME

3-1. plot-optimization

phic plot-optimization [OPTIONS]

Options:
  --name        TEXT      Target directory name  [required]
  --plt-max-c   FLOAT     Maximum value of contact map  [required]
  --plt-max-k   FLOAT     Maximum and minimum values of optimized K map  [required]
  --help                  Show this message and exit.

As of version 2.2.1, plot-optimization reads the arrays pre-computed by optimization and only renders figures; it no longer outputs any .npz files. The --res option has been removed (the resolution is taken from P_normalized.npz).

The outputs are the followings:

NAME/
└── data_optimization/
    ├── C.svg
    ├── Correlation.png
    ├── Correlation_distance_corrected.png
    ├── Cost.svg
    ├── Eta.svg
    ├── K.svg
    └── P.svg

Example:

phic plot-optimization --name NAME --plt-max-c 0.05 --plt-max-k 0.01

3-2. dynamics

phic dynamics [OPTIONS]

Options:
  --name      TEXT      Target directory name  [required]
  --eps       FLOAT     Stepsize in the Langevin dynamics  [default=1e-3]
  --interval  INTEGER   The number of steps between output frames  [required]
  --frame     INTEGER   The number of output frames  [required]
  --sample    INTEGER   The number of output dynamics  [default=1]
  --seed      INTEGER   Seed of the random numbers  [default=12345678]
  --help                Show this message and exit.

The outputs are the followings:

NAME/
└── data_dynamics/
    ├── polymer_N{NUMBER-OF-BEADS}.psf
    ├── sample{SAMPLE-NUMBER}.dcd
    └── sample{SAMPLE-NUMBER}.xyz

Example:

phic dynamics --name NAME --interval 10 --frame 100

3-3. sampling

phic sampling [OPTIONS]

Options:
  --name    TEXT      Target directory name  [required]
  --sample  INTEGER   The number of output conformations  [required]
  --seed    INTEGER   Seed of the random numbers  [default=12345678]
  --help              Show this message and exit.

The outputs are the followings:

NAME/
└── data_sampling/
    ├── polymer_N{NUMBER-OF-BEADS}.psf
    ├── conformations.dcd
    └── conformations.xyz

Example:

phic sampling --name NAME --sample 100

3-4-1. msd

phic msd [OPTIONS]

Options:
  --name  TEXT     Target directory name  [required]
  --help           Show this message and exit.

As of version 2.2.1, the exponent range of the normalized time is automatically determined from the eigenvalues of the Laplacian matrix induced from the optimized polymer network interaction matrix, so the --upper and --lower options have been removed.

The output is the following:

NAME/
└── data_MSD/
    └── MSD_matrix.npz

MSD_matrix.npz contains three arrays and can be loaded with numpy.load using the keys below:

File Key Shape Description
MSD_matrix.npz MSD (M+1, N) MSD of each bead n at each normalized time t[m]
t (M+1,) Normalized time points (log-spaced)
tau (N,) Relaxation times of the normal modes; tau[0] is NaN (center-of-mass mode)

Example:

phic msd --name NAME

3-4-2. plot-msd

phic plot-msd [OPTIONS]

Options:
  --name        TEXT     Target directory name  [required]
  --plt-upper   INTEGER  Upper value of the exponent of the normalized time in the spectrum  [required]
  --plt-lower   INTEGER  Lower value of the exponent of the normalized time in the spectrum  [required]
  --plt-max-log FLOAT    Maximum value of log10 MSD  [required]
  --plt-min-log FLOAT    Minimum value of log10 MSD  [required]
  --aspect      FLOAT    Aspect ratio of the spectrum  [default=0.8]
  --help                 Show this message and exit.

The outputs are the followings:

NAME/
└── data_MSD/
    ├── fig_MSD_curves.png
    └── fig_MSD_spectrum.svg

Example:

phic plot-msd --name NAME --plt-upper 3 --plt-lower 0 --plt-max-log 2.0 --plt-min-log 0.5 --aspect 1.5

3-5-1. losstangent

phic losstangent [OPTIONS]

Options:
  --name    TEXT      Target directory name  [required]
  --help              Show this message and exit.

As of version 2.2.1, the exponent range of the angular frequency is automatically determined from the eigenvalues of the Laplacian matrix induced from the optimized polymer network interaction matrix, so the --upper and --lower options have been removed.

The output is the following:

NAME/
└── data_losstangent/
    └── losstangent_matrix.npz

losstangent_matrix.npz contains three arrays and can be loaded with numpy.load using the keys below:

File Key Shape Description
losstangent_matrix.npz losstangent (M+1, N) Loss tangent tan δ of each bead n at each angular frequency omega[m]
omega (M+1,) Normalized angular frequency points (log-spaced)
tau (N,) Relaxation times of the normal modes; tau[0] is NaN (center-of-mass mode)

Example:

phic losstangent --name NAME

3-5-2. plot-losstangent

phic plot-losstangent [OPTIONS]

Options:
  --name          TEXT      Target directory name  [required]
  --plt-upper     INTEGER   Upper value of the exponent of the angular frequency in the spectrum  [required]
  --plt-lower     INTEGER   Lower value of the exponent of the angular frequency in the spectrum  [required]
  --plt-max-log   FLOAT     Maximum value of log10 tanδ  [required]
  --aspect        FLOAT     Aspect ratio of the spectrum  [default=0.8]
  --help                    Show this message and exit.

The output is the following:

NAME/
└── data_losstangent/
    └── fig_losstangent_spectrum.svg

Example:

phic plot-losstangent --name NAME --plt-upper 0 --plt-lower -3 --plt-max-log 0.3 --aspect 1.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phic-2.2.1.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phic-2.2.1-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file phic-2.2.1.tar.gz.

File metadata

  • Download URL: phic-2.2.1.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for phic-2.2.1.tar.gz
Algorithm Hash digest
SHA256 8cb4133c08ecce311658b82ffbaf0038c6e390dbc5c7ed672394aed2df3ba22a
MD5 a61d74c56bedeba532f6dcb26424d03f
BLAKE2b-256 6755c6a8a4f25ca77b7a4cf585a3e3723b239f7dfecb824ac0aa03a2514c2cd6

See more details on using hashes here.

File details

Details for the file phic-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: phic-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for phic-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 975e2a0ddc6a0cce9efe9472ae590074b1013bc3f7baee09f39ce60229e1d7c7
MD5 5ff775d0e38fbe6d7a27777fa9083127
BLAKE2b-256 cc0c842f2b7c12d3533528ef756d76d451161b7958e5b01687472ca016037d09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page