Skip to main content

Fast multiple protein structure alignment

Project description

PyPI version DOI DOI

Caretta-shape – A multiple protein structure alignment and feature extraction suite

Caretta is a software-suite to perform multiple protein structure alignment and structure feature extraction.

Visit the demo server to see caretta's capabilities. The server only allows alignment of up to 50 proteins at once. (This is currently down, will be back up soon!) The command-line tool and self-hosted web application do not have this restriction.

The older, slower version of Caretta as described in https://doi.org/10.1016/j.csbj.2020.03.011 can be found at https://git.wur.nl/durai001/caretta

Installation

Requirements

Operating system support

  1. Linux and Mac
  • All capabilities are supported
  1. Windows
  • The external tool msms is not available in Windows. Due to this:
    • Feature extraction is not available.
    • features argument in caretta-cli must always be run with --only-dssp.
    • caretta-app is not available.

Software

Caretta works with Python 3.7+ Run the following commands to install required external dependencies (Mac and Linux only):

conda install -c salilab dssp
conda install -c bioconda msms

Install both the command-line interface and the web-application (Mac and Linux only):

pip install "caretta[GUI] @ git+https://github.com/TurtleTools/caretta.git"

Install only the command-line interface:

pip install git+https://github.com/TurtleTools/caretta.git

Environment variables:

export OMP_NUM_THREADS=1 # this should always be 1
export NUMBA_NUM_THREADS=20 # change to required number of threads

Usage

Command-line Usage

caretta-cli input_pdb_folder
# e.g. caretta-cli test_data  

Options:

Usage: caretta-cli [OPTIONS] INPUT_PDB

  Align protein structures using Caretta.

  Writes the resulting sequence alignment and superposed PDB files to
  "caretta_results". Optionally also outputs a set of aligned feature
  matrices, or the python class with intermediate structures made during
  progressive alignment.

Arguments:
  INPUT_PDB  A folder with input protein files  [required]

Options:
  -p FLOAT                        gap open penalty  [default: 1.0]
  -e FLOAT                        gap extend penalty  [default: 0.01]
  -c, --consensus-weight FLOAT    weight well-aligned segments to reduce gaps
                                  in these areas  [default: 1.0]

  -f, --full                      Use all vs. all pairwise alignment for
                                  distance matrix calculation (much slower)
                                  [default: False]

  -o, --output PATH               folder to store output files  [default:
                                  caretta_results]

  --fasta / --no-fasta            write alignment in FASTA file format
                                  [default: True]

  --pdb / --no-pdb                write PDB files superposed according to
                                  alignment  [default: True]

  -t, --threads INTEGER           number of threads to use for feature
                                  extraction  [default: 4]

  --features                      extract and write aligned features as a
                                  dictionary of NumPy arrays into a pickle
                                  file  [default: False]

  --only-dssp                     extract only DSSP features  [default: False]
  --class                         write StructureMultiple class with
                                  intermediate structures and tree to pickle
                                  file  [default: False]

  --matrix                        write distance matrix to file  [default:
                                  False]

  -v, --verbose                   Control verbosity  [default: True]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Web-application Usage (Mac and Linux only)

caretta-app <host-ip> <port> 
# e.g. caretta-app localhost 8091

Then go to localhost:8091/caretta in a browser window.

Features

  • dssp_NH_O_1_index, dssp_NH_O_1_energy, dssp_NH_O_2_index, dssp_NH_O_2_energy, dssp_O_NH_1_index, dssp_O_NH_1_energy, dssp_O_NH_2_index, dssp_O_NH_2_energy: hydrogen bonds; e.g. -3,-1.4 means: if this residue is residue i then N-H of I is h-bonded to C=O of I-3 with an electrostatic H-bond energy of -1.4 kcal/mol. There are two columns for each type of H-bond, to allow for bifurcated H-bonds.
  • dssp_acc: number of water molecules in contact with this residue *10. or residue water exposed surface in Angstrom^2.
  • dssp_alpha: virtual torsion angle (dihedral angle) defined by the four Cα atoms of residues I-1,I,I+1,I+2. Used to define chirality.
  • dssp_kappa: virtual bond angle (bend angle) defined by the three Cα atoms of residues I-2,I,I+2. Used to define bend (structure code ‘S’).
  • dssp_phi: IUPAC peptide backbone torsion angles.
  • dssp_psi: IUPAC peptide backbone torsion angles.
  • dssp_tco: cosine of angle between C=O of residue I and C=O of residue I-1. For α-helices, TCO is near +1, for β-sheets TCO is near -1.
  • anm_ca: Fluctuations of alpha carbon atoms based on an Anisotropic network model
  • anm_cb: Fluctuations of beta carbon atoms based on an Anisotropic network model
  • gnm_ca: Fluctuations of alpha carbon atoms based on a Gaussian network model
  • gnm_cb: Fluctuations of beta carbon atoms based on a Gaussian network model
  • depth_ca: Depths of alpha carbon atoms
  • depth_cb: Depths of beta carbon atoms
  • depth_mean: Mean depth of residues

Publications

Janani Durairaj, Mehmet Akdel, Dick de Ridder, Aalt DJ can Dijk. "Fast and adaptive protein structure representations for machine learning." Machine Learning for Structural Biology Workshop, NeurIPS 2020 (https://doi.org/10.1101/2021.04.07.438777)

Poster: MLSB2020.png

Akdel, Mehmet, Janani Durairaj, Dick de Ridder, and Aalt DJ van Dijk. "Caretta-A Multiple Protein Structure Alignment and Feature Extraction Suite." Computational and Structural Biotechnology Journal (2020). (https://doi.org/10.1016/j.csbj.2020.03.011)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caretta-0.1.2.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

caretta-0.1.2-py2.py3-none-any.whl (38.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file caretta-0.1.2.tar.gz.

File metadata

  • Download URL: caretta-0.1.2.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for caretta-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dc686cfb1b721e8558b0786dfe820be3b81e5d7b112e3af7cd709aaa7467c4b1
MD5 002632945ef26a2881ebb66d79a82bbc
BLAKE2b-256 be560f979e250e0754e5e581010e230a570fe366f4117a8bc7bb0d35deade4a8

See more details on using hashes here.

File details

Details for the file caretta-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: caretta-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for caretta-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8b620dd28b72109a304c11099b2dd09d6428a1ad738135395a4b7b072ee4df5e
MD5 e43dd3a3c971ba7a68e56823d926bef0
BLAKE2b-256 d94cbd36724aa93ea10dac5ec0cbe586249176014e6c213280ed4c8f240de111

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page