Fast multiple protein structure alignment
Project description
Caretta-shape – A multiple protein structure alignment and feature extraction suite
Caretta is a software-suite to perform multiple protein structure alignment and structure feature extraction.
Visit the demo server to see caretta's capabilities. The server only allows alignment of up to 50 proteins at once. (This is currently down, will be back up soon!) The command-line tool and self-hosted web application do not have this restriction.
The older, slower version of Caretta as described in https://doi.org/10.1016/j.csbj.2020.03.011 can be found at https://git.wur.nl/durai001/caretta
Installation
Requirements
Operating system support
- Linux and Mac
- All capabilities are supported
- Windows
- The external tool msms is not available in Windows. Due to this:
- Feature extraction is not available.
features
argument in caretta-cli must always be run with--only-dssp
.caretta-app
is not available.
Software
Caretta works with Python 3.7+ Run the following commands to install required external dependencies (Mac and Linux only):
conda install -c salilab dssp
conda install -c bioconda msms
Install both the command-line interface and the web-application (Mac and Linux only):
pip install "caretta[GUI] @ git+https://github.com/TurtleTools/caretta.git"
Install only the command-line interface:
pip install git+https://github.com/TurtleTools/caretta.git
Environment variables:
export OMP_NUM_THREADS=1 # this should always be 1
export NUMBA_NUM_THREADS=20 # change to required number of threads
Usage
Command-line Usage
caretta-cli input_pdb_folder
# e.g. caretta-cli test_data
Options:
Usage: caretta-cli [OPTIONS] INPUT_PDB
Align protein structures using Caretta.
Writes the resulting sequence alignment and superposed PDB files to
"caretta_results". Optionally also outputs a set of aligned feature
matrices, or the python class with intermediate structures made during
progressive alignment.
Arguments:
INPUT_PDB A folder with input protein files [required]
Options:
-p FLOAT gap open penalty [default: 1.0]
-e FLOAT gap extend penalty [default: 0.01]
-c, --consensus-weight FLOAT weight well-aligned segments to reduce gaps
in these areas [default: 1.0]
-f, --full Use all vs. all pairwise alignment for
distance matrix calculation (much slower)
[default: False]
-o, --output PATH folder to store output files [default:
caretta_results]
--fasta / --no-fasta write alignment in FASTA file format
[default: True]
--pdb / --no-pdb write PDB files superposed according to
alignment [default: True]
-t, --threads INTEGER number of threads to use for feature
extraction [default: 4]
--features extract and write aligned features as a
dictionary of NumPy arrays into a pickle
file [default: False]
--only-dssp extract only DSSP features [default: False]
--class write StructureMultiple class with
intermediate structures and tree to pickle
file [default: False]
--matrix write distance matrix to file [default:
False]
-v, --verbose Control verbosity [default: True]
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Web-application Usage (Mac and Linux only)
caretta-app <host-ip> <port>
# e.g. caretta-app localhost 8091
Then go to localhost:8091/caretta in a browser window.
Features
dssp_NH_O_1_index
,dssp_NH_O_1_energy
,dssp_NH_O_2_index
,dssp_NH_O_2_energy
,dssp_O_NH_1_index
,dssp_O_NH_1_energy
,dssp_O_NH_2_index
,dssp_O_NH_2_energy
: hydrogen bonds; e.g. -3,-1.4 means: if this residue is residue i then N-H of I is h-bonded to C=O of I-3 with an electrostatic H-bond energy of -1.4 kcal/mol. There are two columns for each type of H-bond, to allow for bifurcated H-bonds.dssp_acc
: number of water molecules in contact with this residue *10. or residue water exposed surface in Angstrom^2.dssp_alpha
: virtual torsion angle (dihedral angle) defined by the four Cα atoms of residues I-1,I,I+1,I+2. Used to define chirality.dssp_kappa
: virtual bond angle (bend angle) defined by the three Cα atoms of residues I-2,I,I+2. Used to define bend (structure code ‘S’).dssp_phi
: IUPAC peptide backbone torsion angles.dssp_psi
: IUPAC peptide backbone torsion angles.dssp_tco
: cosine of angle between C=O of residue I and C=O of residue I-1. For α-helices, TCO is near +1, for β-sheets TCO is near -1.anm_ca
: Fluctuations of alpha carbon atoms based on an Anisotropic network modelanm_cb
: Fluctuations of beta carbon atoms based on an Anisotropic network modelgnm_ca
: Fluctuations of alpha carbon atoms based on a Gaussian network modelgnm_cb
: Fluctuations of beta carbon atoms based on a Gaussian network modeldepth_ca
: Depths of alpha carbon atomsdepth_cb
: Depths of beta carbon atomsdepth_mean
: Mean depth of residues
Publications
Janani Durairaj, Mehmet Akdel, Dick de Ridder, Aalt DJ can Dijk. "Fast and adaptive protein structure representations for machine learning." Machine Learning for Structural Biology Workshop, NeurIPS 2020 (https://doi.org/10.1101/2021.04.07.438777)
Poster:
Akdel, Mehmet, Janani Durairaj, Dick de Ridder, and Aalt DJ van Dijk. "Caretta-A Multiple Protein Structure Alignment and Feature Extraction Suite." Computational and Structural Biotechnology Journal (2020). (https://doi.org/10.1016/j.csbj.2020.03.011)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file caretta-0.1.2.tar.gz
.
File metadata
- Download URL: caretta-0.1.2.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc686cfb1b721e8558b0786dfe820be3b81e5d7b112e3af7cd709aaa7467c4b1 |
|
MD5 | 002632945ef26a2881ebb66d79a82bbc |
|
BLAKE2b-256 | be560f979e250e0754e5e581010e230a570fe366f4117a8bc7bb0d35deade4a8 |
File details
Details for the file caretta-0.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: caretta-0.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 38.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b620dd28b72109a304c11099b2dd09d6428a1ad738135395a4b7b072ee4df5e |
|
MD5 | e43dd3a3c971ba7a68e56823d926bef0 |
|
BLAKE2b-256 | d94cbd36724aa93ea10dac5ec0cbe586249176014e6c213280ed4c8f240de111 |