Direct couplings analysis (DCA) for protein and RNA sequences
Project description
About pydca
pydca
is Python implementation of direct coupling analysis (DCA) of residue coevolution for protein and RNA sequence families using the mean-field and pseudolikelihood maximization algorithms. Given multiple sequence alignment (MSA) files in FASTA format, pydca
computes the coevolutionary scores of pairs of sites in the alignment. In addition, when an optional file containing a reference sequence is supplied, scores corresponding to pairs of sites of this reference sequence are computed by mapping the reference sequence to the MSA. The software provides command line utilities or it can be used as a library.
Prerequisites
pydca
is implemented mainly in Python with the pseudolikelihood maximization parameter inference part implemented using C++ backend for optimization. To install pydca and successfully carry out DCA computations, the following are required.
- Python 3, version 3.5 or later.
- C++ compiler that supports C++11 (we recommend GCC).
- Optionally, OpenMP for multithreading support.
Installing
To install the current version of pydca
from PyPI, run on the command line
$ pip install pydca
or you can use the install.sh
bash script as
$ source install.sh
Using pydca
as a Python Library
After installation, pydca can be imported into other Python source codes and used. Here is IPython Notebook example. If you encounter a problem opening the Ipython Notebook example, copy and past the URL here.
Running pydca
From Command Line
When pydca
is installed, it provides three main command. Namely pydca
, plmdca
, and mfdca
.
The command pydca
is used for tasks such as trimming alignment data before DCA computation, and
visualization of contact maps or true positive rates. The other two command are associated with
DCA computation with the pseudolikelihood maximization algorithm (plmDCA) or the mean-field algorithm (mfDCA).
Below we show some usage examples of all the three commands.
Trimming MSA data
Trim gaps by reference sequence:
$ pydca trim_by_refseq <biomolecule> <alignment.fa> <refseq_file.fa> --remove_all_gaps --verbose
Trim by percentage of gaps in MSA columns:
$ pydca trim_by_gap_size <alignmnet.fa> --max_gap 0.9 --verbose
DCA Computation
Using pydca
's Pseudolikelihood Maximization Algorithm
$ plmdca compute_fn <biomolecule> <alignment.fa> --max_iterations 500 --num_threads 6 --apc --verbose
We can also the values of regularization parameters
$ plmdca compute_fn <biomolecule> <alignment.fa> --apc --lambda_h 1.0 --lambda_J 50.0 --verbose
The command compute_fn
computes DCA scores obtained from the Frobenius norm of the couplings. --apc
performs
average product correction (APC). To obtain DCA scores from direct-information (DI) we replace the subcommand
compute_fn
by compute_di
.
Using pydca
's Mean-Field Algorithm
$ mfdca compute_fn <biomolecule> <alignment.fa> --apc --pseudocount 0.5 --verbose
Contact Map Visualization
When protein/RNA sequence family has a resolved PDB structure, we can evaluate the
performance of pydca
by contact map visualization. Example:
$ pydca plot_contact_map <biomolecule> <PDB_chain_name> <PDB_id/PDB_file.PDB> <refseq.fa> <DCA_file.txt> --verbose
Plotting True Positive Rate
In addition to contact map we can evaluate the performance of pydca
by plotting
the true positive rate.
$ pydca plot_contact_map <biomolecule> <PDB_chain_name> <PDB_id/PDB_file.PDB> <refseq.fa> <DCA_file.txt> --verbose
To get help message about a (sub)command we use, for example,
$ pydca --help
$ plmdca compute_fn --help
References
-
Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, DS., Sander, C., Zecchina, R., Onuchic, JN., Hwa, T., and Weigt, M.
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
PNAS December 6, 2011 108 (49) E1293-E1301. doi:10.1073/pnas.1111471108 -
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M., & Aurell, E. (2013).
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models.
Physical Review E, 87(1), 012707. doi:10.1103/PhysRevE.87.012707
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pydca-1.20.tar.gz
.
File metadata
- Download URL: pydca-1.20.tar.gz
- Upload date:
- Size: 95.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.5.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d7d31584f4fc796e6c603bf86f404a056af188bc6308b3c947d5e2599fd32c9 |
|
MD5 | 48ad3f00154b58adedc556b76f460917 |
|
BLAKE2b-256 | 8d3c822307500af52dfbf8b9696b13fbce938be034b31c054b93710c6b8f2d1a |