Protein structure prediction with the DMPfold2 method
Project description
DMPfold2
DMPfold2 is a fast and accurate method for protein structure prediction. It uses learned representations of multiple sequence alignments and end-to-end model generation to quickly generate models from alignments.
If you use DMPfold2, please cite the paper: Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments, S M Kandathil, J G Greener, A M Lau, D T Jones, bioRxiv (2021).
Installation
DMPfold2 is easier to install than DMPfold1, which had many more dependencies.
-
Python 3.6 or later is required.
-
Install PyTorch as appropriate for your system. A GPU is not required but gives some speedup to longer runs.
-
Run
pip install dmpfold
, which adds thedmpfold
executable to the path. The first time you run a prediction the trained model files (~140 MB) will be downloaded to the package directory, which requires an internet connection.
Usage
Run dmpfold -h
to see a help message.
To run DMPfold2 you will need a sequence alignment in aln
format: one sequence per line with the ungapped target sequence as the first line (example here).
Lines starting with >
are ignored.
Sequence alignments can be obtained from a target sequence in a number of ways, for example by running hhblits
on the Uniclust database.
DMPfold2 prints a PDB format file to stdout, including the confidence as a remark.
Default mode (10 iteration cycles + 100 steps geometry minimization on cpu device):
dmpfold -i input.aln > fold.pdb
Default mode on cuda device 0:
dmpfold -i input.aln -d cuda:0 > fold.pdb
Fastest mode (no iteration or refinement):
dmpfold -i input.aln -n 0 -m 0 > fold.pdb
30 iteration cycles + 200 steps geometry minimization:
dmpfold -i input.aln -n 30 -m 200 > fold.pdb
If you already have a model (only CA atoms are used) e.g. from HHsearch/MODELLER (30 iteration cycles + 200 minimization steps + template seed structure):
dmpfold -i input.aln -n 30 -m 200 -t template.pdb > fold.pdb
Ridiculous long run taking hours (100000 iterations + 1000 minimization steps):
dmpfold -i input.aln -n 100000 -m 1000 > fold.pdb
Python module
DMPfold2 can also be used within Python, allowing you to use it as part of other Python scripts. For example:
from dmpfold import aln_to_coords
# Default options
coords, confs = aln_to_coords("input.aln")
# Change options
coords, confs = aln_to_coords("input.aln", device="cuda", template="template.pdb", iterations=30, minsteps=200)
coords
is a PyTorch tensor with shape (nres, 5, 3)
where the first axis is the residue index, the second is the atom (N, CA, C, O, CB) and the third is the coordinates in Angstrom.
confs
is a PyTorch tensor corresponding to the predicted confidence for each residue.
CASP14 version
If for some reason you need the CASP14 version of the developing DMPfold2, run git checkout casp14
on this repository and find instructions in the readme file.
This version used three approaches to generate models from constraints - CNS, XPLOR-NIH and a PyTorch-based molecular dynamics approach - but is less accurate, slower and harder to install than the current end-to-end approach.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dmpfold-2.0.dev1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 757c071d440b60083fe20ed6b04ad46f79740839350a5f974556e1c6c1fa53df |
|
MD5 | 26859f8ba35b0d9efcf56eaf28c84da2 |
|
BLAKE2b-256 | 900c2f336e5412d4527e2c3a01ff23ea1f00639b58410139cb3646fbdac2f236 |