This software is used to calculate conformational states probability & conformational state variability from a protein structure ensemble.
Project description
Conformational states probability & conformational state variability from a protein structure ensemble
Table of content
- Table of content
- Description
- Installation
- Running constava from a containter (Docker)
- Usage
- License
- Citation
- Authors
- Acknowledgements
- Contact
Description
Constava analyzes conformational ensembles calculating conformational state propensities and conformational state variability. The conformational state propensities indicate the likelihood of a residue residing in a given conformational state, while the conformational state variability is a measure of the residues ability to transition between conformational states.
Each conformational state is a statistical model of based on the backbone dihedrals (phi, psi). The default models were derived from an analysis of NMR ensembles and chemical shifts. To analyze a conformational ensemble, the phi- and psi-angles for each conformational state in the ensemble need to be provided.
As input data Constava needs the backbone dihedral angles extracted from the
conformational ensemble. These dihedrals can be obtained using GROMACS'
gmx chi module (set --input-format=xvg) or using the constava dihedrals
submodule, which supports a wide range of MD and structure formats.
Installation
Prerequisites
- Python 3.8 or higher (up to 3.14 inclusive)
- pip
Installation through PyPI
We recommend this installation for most users.
# Create a virtual environment (optional but recommended):
python3 -m venv constava
source constava/bin/activate
# Install the python module:
pip install constava
# Run tests to ensure the successful installation (optional but recommended):
constava test
If the package requires to be uninstalled, run pip uninstall constava.
Installation through conda
To install Constava through conda please follow the instructions below (both Conda-Forge and Bioconda channels are required to install Constava dependencies).
# Create a conda environment (optional but recommended):
conda create -n constava python=3.12
conde activate constava
# Install constava
conda install -c bioconda -c conda-forge constava
# Run tests to ensure the successful installation (optional but recommended):
constava test
If the package requires to be uninstalled, run conda remove constava.
Installation from the source
To download and install the latest version of the software from the source code
follow the instructions below (you will have to install wheel, build & setuptools)
# Clone the repository:
git clone https://bitbucket.org/bio2byte/constava/
cd constava
# Create a virtual environment (optional but recommended):
python3 -m venv constava
source constava/bin/activate
# Install the building dependencies:
python3 -m pip install wheel build setuptools
# Build and install the package from the packages root directory:
# ... build package from source
make build
# ... install it locally
make install
# ... test the installation
make test
If the package requires to be uninstalled, run make uninstall in the terminal
from the package's root directory.
Troubleshooting
Libtiff issues
If you run constava and see an error related to the library libtiff such as libtiff.5.dylib' (no such file), you can try to fix it by installing libtiff. For instance, using conda:
conda install libtiff
Running constava from a container (Docker)
Using constava as a command line tool inside a Docker container
To use constava's Docker image generated by the Biocontainers project based on the Bioconda package, follow the instructions below. You can find the container tags on the Biocontainers archive.
In this example, the latest tag is 1.1.0--pyhdfd78af_0:
# Pull the constava image from quay.io
docker pull quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0
# Run a container with the constava image
docker run \
-it quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0 \
constava <COMMAND-LINE-OPTIONS>
# Optionally, you can mount a local directory to the container for accessing your data
docker run \
-it -v /path/to/your/data:/data quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0 \
constava <COMMAND-LINE-OPTIONS>
To stop and remove the container, use the following commands:
# List all running containers
docker ps
# Stop a running container (replace <container_id> with the actual container ID)
docker stop <container_id>
# Remove the stopped container (replace <container_id> with the actual container ID)
docker rm <container_id>
If the image requires to be removed, run docker rmi -f quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0.
Using constava as a library inside a Docker container
To use constava as a library inside the Docker container, follow the instructions below. This allows you to interact with the constava library directly within a Python session inside the Docker container.
# Start an interactive Python session inside the constava container
docker run \
--rm -it quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0 \
python
# This will start a Python shell where you can import constava
# >>> import constava
# >>>
# Alternatively, execute a python script inside the constava container
docker run \
--rm -it quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0 \
python <python-script>
If the image requires to be removed, run docker rmi -f quay.io/biocontainers/constava:1.1.0--pyhdfd78af_0.
Usage
The software provides two modes of interaction. Shell user may use the software from the command line, while users skilled in Python can import it as a module. We provide a couple of usage examples in a Colab notebook.
Execution from the command line
The software is subdivided in three submodules:
The constava dihedrals submodule provides a simple way to extract backbone
dihedral angles from MD simulations or PDB ensembles. For more information
run: constava dihedrals -h. Alternatively, the backbone dihedrals may be
extracted with GROMACS' gmx chi module.
The constava analyze submodule analyzes the provided backbone dihedral angles
and infers the propensities for each residue to reside in a given
conformational state. For more information run: constava analyze -h.
The constava fit-model can be used to train a custom probabilistic model of
conformational states. The default models were derived from an analysis of NMR
ensembles and chemical shifts; they cover six conformational states:
- Core Helix - Exclusively alpha-helical, low backbone dynamics
- Surrounding Helix - Mostly alpha-helical, high backbone dynamics
- Core Sheet - Exclusively beta-sheet, low backbone dynamics
- Surrounding Sheet - Mostly extended conformation, high backbone dynamics
- Turn - Mostly turn, high backbone dynamics
- Other - Mostly coil, high backbone dynamics
Extracting backbone dihedrals from a trajectory
To extract dihedral angles from a trajectory the constava dihedrals submodule
is used.
usage: constava dihedrals [-h] [-s <file.pdb>] [-f <file.xtc> [<file.xtc> ...]]
[-o OUTPUT] [--selection SELECTION] [--precision PRECISION]
[--degrees] [-O]
The `constava dihedrals` submodule is used to extract the backbone dihedrals
needed for the analysis from conformational ensembles. By default the results
are written out in radians as this is the preferred format for
`constava analyze`.
Note: For the first and last residue in a protein only one backbone dihedral
can be extracted. Thus, those residues are omitted by default.
optional arguments:
-h, --help Show this help message and exit
Input & output options:
-s <file.pdb>, --structure <file.pdb>
Structure file with atomic information: [pdb, gro, tpr]
-f <file.xtc> [<file.xtc> ...], --trajectory <file.xtc> [<file.xtc> ...]
Trajectory file with coordinates: [pdb, gro, trr, xtc, crd, nc]
-o OUTPUT, --output OUTPUT
CSV file to write dihedral information to. (default: dihedrals.csv)
Input & output options:
--selection SELECTION
Selection for the dihedral calculation. (default: 'protein')
--precision PRECISION
Defines the number of decimals written for the dihedrals. (default: 5)
--degrees If set results are written in degrees instead of radians.
-O, --overwrite If set any previously generated output will be overwritten.
An example:
# Obtain backbone dihedrals (overwriting any existing files)
constava dihedrals -O -s "2mkx.gro" -f "2mkx.xtc" -o "2mkx_dihedrals.csv"
Analyzing a conformational ensemble
To analyze the backbone dihedral angles extracted from a conformational ensemble,
the constava analyze submodule is used.
usage: constava analyze [-h] [-i <file.csv> [<file.csv> ...]]
[--input-format {auto,xvg,csv}] [-o <file.csv>]
[--output-format {auto,csv,json,tsv}] [-m <file.pkl>]
[--window <int> [<int> ...]] [--window-series <int> [<int> ...]]
[--bootstrap <int> [<int> ...]]
[--bootstrap-series <int> [<int> ...]]
[--bootstrap-samples <int>]
[--degrees] [--precision <int>]
[--indent_size <int>] [--seed <int>]
[-v]
The `constava analyze` submodule analyzes the provided backbone dihedral angles
and infers the propensities for each residue to reside in a given
conformational state.
Each conformational state is a statistical model of based on the backbone
dihedrals (phi, psi). The default models were derived from an analysis of NMR
ensembles and chemical shifts. To analyze a conformational ensemble, the phi-
and psi-angles for each conformational state in the ensemble need to be
provided.
As input data the backbone dihedral angles extracted from the conformational
ensemble need to be provided. Those can be generated using the
`constava dihedrals` submodule (`--input-format csv`) or GROMACS'
`gmx chi` module (`--input-format xvg`).
optional arguments:
-h, --help Show this help message and exit
Input & output options:
-i <file.csv> [<file.csv> ...], --input <file.csv> [<file.csv> ...]
Input file(s) that contain the dihedral angles.
--input-format {auto,xvg,csv}
Format of the input file: {'auto', 'csv', 'xvg'}
-o <file.csv>, --output <file.csv>
The file to write the results to.
--output-format {auto,csv,json,tsv}
Format of output file: {'csv', 'json', 'tsv'}. (default: 'auto')
Conformational state model options:
-m <file.pkl>, --load-model <file.pkl>
Load a conformational state model from the given pickled
file. If not provided, the default model will be used.
Sub-sampling options:
--window <int> [<int> ...]
Do inference using a moving reading-frame. Each reading
frame consists of <int> consecutive samples. Multiple
values can be provided.
--window-series <int> [<int> ...]
Do inference using a moving reading-frame. Each reading
frame consists of <int> consecutive samples. Return the
results for every window rather than the average. This can
result in very large output files. Multiple values can be
provided.
--bootstrap <int> [<int> ...]
Do inference using <Int> samples obtained through
bootstrapping. Multiple values can be provided.
--bootstrap-series <int> [<int> ...]
Do inference using <Int> samples obtained through
bootstrapping. Return the results for every subsample
rather than the average. This can result in very
large output files. Multiple values can be provided.
--bootstrap-samples <int>
When bootstrapping, sample <Int> times from the input data.
(default: 500)
Miscellaneous options:
--degrees Set this flag, if dihedrals in the input files are in
degrees.
--precision <int> Sets the number of decimals in the output files.
--indent_size <int> Sets the number of spaces used to indent
the output document (default: 0)
--seed <int> Set random seed for bootstrap sampling
-v, --verbose Set verbosity level of screen output. Flag can be given
multiple times (up to 2) to gradually increase output to
debugging mode.
An example with debug-level output:
# Run constava with debug-level output
constava analyze \
-i "2mkx_dihedrals.csv" \
-o "2mkx_constava.json" \
--output-format json \
--indent_size 4 \
--window 3 5 25 \
-vv
Generating custom conformational state models
To train a custom probabilistic model of conformational states, the constava fit-model
submodule is used.
usage: constava fit-model [-h] [-i <file.json>] -o <file.pkl>
[--model-type {kde,grid}] [--kde-bandwidth <float>]
[--grid-points <int>] [--degrees]
[-v]
The `constava fit-model` submodule is used to generate the probabilistic
conformational state models used in the analysis. By default, when running
`constava analyze` these models are generated on-the-fly. In selected cases
generating a model beforehand and loading it can be useful, though.
We provide two model types. kde-Models are the default. They are fast to fit
but may be slow in the inference in large conformational ensembles (e.g.,
long-timescale MD simulations). The idea of grid-Models is, to replace
the continuous probability density function of the kde-Model by a fixed set
of grid-points. The PDF for any sample is then estimated by linear
interpolation between the nearest grid points. This is slightly less
accurate than the kde-Model but speeds up inference significantly.
optional arguments:
-h, --help Show this help message and exit
Input and output options:
-i <file.json>, --input <file.json>
The data to which the new conformational state models will
be fitted. It should be provided as a JSON file. The
top-most key should indicate the names of the
conformational states. On the level below, lists of phi-/
psi pairs for each stat should be provided. If not provided
the default data from the publication will be used.
-o <file.pkl>, --output <file.pkl>
Write the generated model to a pickled file, that can be
loaded gain using `constava analyze --load-model`
Conformational state model options:
--model-type {kde,grid}
The probabilistic conformational state model used. The
default is `kde`. The alternative `grid` runs significantly
faster while slightly sacrificing accuracy: {'kde', 'grid'}
(default: 'kde')
--kde-bandwidth <float>
This flag controls the bandwidth of the Gaussian kernel
density estimator. (default: 0.13)
--grid-points <int> This flag controls how many grid points are used to
describe the probability density function. Only applies if
`--model-type` is set to `grid`. (default: 10000)
Miscellaneous options:
--degrees Set this flag, if dihedrals in `model-data` are in degrees
instead of radians.
-v, --verbose Set verbosity level of screen output. Flag can be given
multiple times (up to 2) to gradually increase output to
debugging mode.
An example using the default dataset:
# Generates a faster 'grid-interpolation model' using the default dataset
constava fit-model -v \
-o default_grid.pkl \
--model-type grid \
--kde-bandwidth 0.13 \
--grid-points 6400
Execution as a python library
The module provides the Constava class a general interface to software's
features. The only notable exception is the extraction of dihedrals,
which is done through a separate function.
Extracting backbone dihedrals as a DataFrame
import pandas as pd
from constava.utils.dihedrals import calculate_dihedrals
# Calculate dihedrals as a DataFrame
dihedrals = calculate_dihedrals(structure="./2mkx.pdb", trajectory="2mkx.xtc")
# Write dihedrals out as a csv
dihedrals.to_csv("2mkx_dihedrals.csv", index=False, float_format="%.4f")
Setting parameters and analyzing a conformational ensemble
This example code will generate an output for a protein:
# Initialize Constava Python interface with parameters
import glob
from constava import Constava
# Define input and output files
PDBID = "2mkx"
input_files = glob.glob(f"./{PDBID}/ramaPhiPsi*.xvg")
output_file = f"./{PDBID}_constava.csv"
# Initialize Constava Python interface with parameters
c = Constava(
input_files = input_files,
output_file = output_file,
bootstrap = [3,5,10,25],
input_degrees = True,
verbose = 2)
# Alter parameters after initialization
c.set_param("window", [1,3,5])
# Run the calculation and write results
c.run()
This protein, with 48 residues and 100 frames per residue runs in about 1 minute.
The original MD ensembles from the manuscript can be found in https://doi.org/10.5281/zenodo.8160755.
Generating and loading conformational state models
Conformational state models are usually fitted at runtime. This is usually the
safest option to retain compatibility. For kde models, refitting usually
takes less than a second and is almost neglectable. However, grid interpolation
models take longer to generate. Thus, it makes sense to store them when
running multiple predictions on the same model.
Note: Conformational state model-pickles are intended for quickly rerunning simulations. They are not for storing or sharing your conformational state models. When you need to store or share a custom conformational state model, provide the training data and and model-fitting parameters.
from constava import Constava
# Fit the grid-interpolation model
c = Constava(verbose = 1)
csmodel = c.fit_csmodel(model_type = "grid",
kde_bandwidth = .13,
grid_points = 10_201)
# Write the fitted model out as a pickle
csmodel.dump_pickle("grid_model.pkl")
# Use the new model to analyze a conformational ensemble
PDBID = "2mkx"
input_files = glob.glob(f"./{PDBID}_dihedrals.csv")
output_file = f"./{PDBID}_constava.csv"
c = Constava(
input_files = input_files,
output_file = output_file,
model_load = "grid_model.pkl",
input_degrees=True,
window = [1, 5, 10, 25],
verbose = 1)
c.run()
Constava-class parameters vs. command line arguments
In the following table, all available parameters of the Python interface (Constava
class) and their corresponding command line arguments are listed. The defaults for
parameters in Python and command line are the same.
| Python parameter | Command line argument | Description |
|---|---|---|
input_files : List[str] or str |
constava analyze --input <file> [<file> ...] |
Input file(s) that contain the dihedral angles. |
input_format : str |
constava analyze --input-format <enum> |
Format of the input file: {'auto', 'csv', 'xvg'} |
output_file : str |
constava analyze --output <file> |
The file to write the output to. |
output_format : str |
constava analyze --output-format <enum> |
Format of output file: {'auto', 'csv', 'json', 'tsv'} |
model_type : str |
constava fit-model --model-type <enum> |
The probabilistic conformational state model used. Default is kde. The alternative grid runs significantly faster while slightly sacrificing accuracy: {'kde', 'grid'} |
model_load : str |
constava analyze --load-model <file> |
Load a conformational state model from the given pickled file. |
model_data : str |
constava fit-model --input <file> |
Fit conformational state models to data provided in the given file. |
model_dump : str |
constava fit-model --output <file> |
Write the generated model to a pickled file, that can be loaded again using model_load. |
window : List[int] or int |
constava analyze --window <Int> [<Int> ...] |
Do inference using a moving reading-frame of |
window_series : List[int] or int |
constava analyze --window-series <Int> [<Int> ...] |
Do inference using a moving reading-frame of |
bootstrap : List[int] or int |
constava analyze --bootstrap <Int> [<Int> ...] |
Do inference using |
bootstrap_series : List[int] or int |
constava analyze --bootstrap-series <Int> [<Int> ...] |
Do inference using |
bootstrap_samples : int |
constava analyze --bootstrap-samples <Int> |
When bootstrapping, sample |
input_degrees : bool |
constava analyze --degrees |
Set True if input files are in degrees. |
model_data_degrees : bool |
constava fit-model --degrees |
Set True if the data given under model_data to is given in degrees. |
precision : int |
constava analyze --precision <int> |
Sets the number of decimals in the output files. By default, 4 decimals. |
kde_bandwidth : float |
constava fit-model --kde-bandwidth <float> |
This controls the bandwidth of the Gaussian kernel density estimator. |
grid_points : int |
constava analyze --grid-points <int> |
When model_type equals 'grid', this controls how many grid points are used to describe the probability density function. |
seed : int |
constava analyze --seed <int> |
Set the random seed especially for bootstrapping. |
verbose : int |
constava <...> -v [-v] |
Set verbosity level of screen output. |
License
Distributed under the GNU General Public License v3 (GPLv3) License.
Citation
Gavalda-Garcia, J., Bickel, D., Roca-Martinez, J., Raimondi, D., Orlando, G., & Vranken, W. (2024). Data-driven probabilistic definition of the low energy conformational states of protein residues. NAR Genomics and Bioinformatics, 6(3), lqae082. https://doi.org/10.1093/nargab/lqae082
Authors
-
Jose Gavalda-Garcia♠
- jose.gavalda.garcia@vub.be
-
David Bickel♠
- david.bickel@vub.be
-
Joel Roca-Martinez
- joel.roca.martinez@vub.be
-
Daniele Raimondi -
- daniele.raimondi@kuleuven.be
-
Gabriele Orlando -
- gabriele.orlando@kuleuven.be
-
Wim Vranken -
- Personal page - wim.vranken@vub.be
♠ Authors contributed equally to this work.
Acknowledgments
We thank Adrián Díaz for the invaluable help in the distribution of this software.
Contact
- Maintainers - bio2byte@vub.be
- Wim Vranken - wim.vranken@vub.be
- Bio2Byte website: https://bio2byte.be/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file constava-1.2.0.tar.gz.
File metadata
- Download URL: constava-1.2.0.tar.gz
- Upload date:
- Size: 7.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82dd5376cf73f29a46744ce010adf5bc389750d2a2f0841342950c0b078ea639
|
|
| MD5 |
d8930be99557d23480a38fe38dc70b7f
|
|
| BLAKE2b-256 |
5bb61c83d4234023fc44cd669c151a3234cc82bc08d7fbbb1f6fc5aabb0d0c9d
|
File details
Details for the file constava-1.2.0-py3-none-any.whl.
File metadata
- Download URL: constava-1.2.0-py3-none-any.whl
- Upload date:
- Size: 7.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ecb5ab285b34255cb10983d26b28b3fcd813261c0c9a902d711f6978a0fb5b9
|
|
| MD5 |
8ae37c3c8a4f371d619036a036bdaed2
|
|
| BLAKE2b-256 |
565f4052155006dbcf5acd123ccdbff01b073affc8e383e592006b2c88809e37
|