Direct couplings analysis (DCA) for protein and RNA sequences

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- C
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

About `pydca`

pydca is Python implementation of direct coupling analysis (DCA) of residue coevolution for protein and RNA sequence families using the mean-field and pseudolikelihood maximization algorithms. Given multiple sequence alignment (MSA) files in FASTA format, pydca computes the coevolutionary scores of pairs of sites in the alignment. In addition, when an optional file containing a reference sequence is supplied, scores corresponding to pairs of sites of this reference sequence are computed by mapping the reference sequence to the MSA. The software provides command line utilities or it can be used as a library.

Prerequisites

pydca is implemented mainly in Python with the pseudolikelihood maximization parameter inference part implemented using C++ backend for optimization. To install pydca and successfully carry out DCA computations, the following are required.

Python 3, version 3.5 or later.
C++ compiler that supports C++11 (e.g. the GNU compiler collection).
Optionally, OpenMP for multithreading support.

Installing

To install the current version of pydca from PyPI, run on the command line

$ pip install pydca

or you can use the install.sh bash script as

$ source install.sh

Using `pydca` as a Python Library

After installation, pydca can be imported into other Python source codes and used. Here is IPython Notebook example. If you encounter a problem opening the Ipython Notebook example, copy and past the URL here.

Running `pydca` From Command Line

When pydca is installed, it provides three main command. Namely pydca, plmdca, and mfdca. The command pydca is used for tasks such as trimming alignment data before DCA computation, and visualization of contact maps or true positive rates. The other two command are associated with DCA computation with the pseudolikelihood maximization algorithm (plmDCA) or the mean-field algorithm (mfDCA). Below we show some usage examples of all the three commands.

Trimming MSA data

Trim gaps by reference sequence:

$ pydca trim_by_refseq <biomolecule>  <alignment.fa>  <refseq_file.fa> --remove_all_gaps --verbose

Trim by percentage of gaps in MSA columns:

$ pydca trim_by_gap_size <alignmnet.fa> --max_gap 0.9 --verbose

DCA Computation

Using `pydca`'s Pseudolikelihood Maximization Algorithm

$ plmdca compute_fn <biomolecule> <alignment.fa> --max_iterations 500 --num_threads 6 --apc --verbose

We can also the values of regularization parameters

$ plmdca compute_fn <biomolecule> <alignment.fa> --apc --lambda_h 1.0 --lambda_J 50.0 --verbose

The command compute_fn computes DCA scores obtained from the Frobenius norm of the couplings. --apc performs average product correction (APC). To obtain DCA scores from direct-information (DI) we replace the subcommand compute_fn by compute_di.

Using `pydca`'s Mean-Field Algorithm

$ mfdca compute_fn <biomolecule> <alignment.fa> --apc --pseudocount 0.5 --verbose

Contact Map Visualization

When protein/RNA sequence family has a resolved PDB structure, we can evaluate the performance of pydca by contact map visualization. Example:

$ pydca plot_contact_map <biomolecule> <PDB_chain_name> <PDB_id/PDB_file.PDB> <refseq.fa> <DCA_file.txt> --verbose

Plotting True Positive Rate

In addition to contact map we can evaluate the performance of pydca by plotting the true positive rate.

$ pydca plot_contact_map <biomolecule> <PDB_chain_name> <PDB_id/PDB_file.PDB> <refseq.fa> <DCA_file.txt> --verbose

To get help message about a (sub)command we use, for example,

$ pydca --help

$ plmdca compute_fn  --help

References

If you use pydca for your work please cite the following references

Zerihun, MB., Pucci, F, Peter, EK, and Schug, A.
pydca: v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences
Bioinformatics, btz892, doi.org/10.1093/bioinformatics/btz892
Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, DS., Sander, C., Zecchina, R., Onuchic, JN., Hwa, T., and Weigt, M.
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
PNAS December 6, 2011 108 (49) E1293-E1301, doi:10.1073/pnas.1111471108
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M., & Aurell, E. (2013).
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models.
Physical Review E, 87(1), 012707, doi:10.1103/PhysRevE.87.012707

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- C
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

1.23

May 27, 2021

1.22

Jan 8, 2020

1.21

Oct 31, 2019

1.20

Oct 29, 2019

1.0

Oct 16, 2019

0.2

May 28, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydca-1.23.tar.gz (98.7 kB view details)

Uploaded May 27, 2021 Source

File details

Details for the file pydca-1.23.tar.gz.

File metadata

Download URL: pydca-1.23.tar.gz
Upload date: May 27, 2021
Size: 98.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.9

File hashes

Hashes for pydca-1.23.tar.gz
Algorithm	Hash digest
SHA256	`41338ca922ca073dd2c3e3a771adeb9b21fccc55e9aeebfdd3ca73dcd3c54fda`
MD5	`83a2cad222b2e923ddc20c3703c90a82`
BLAKE2b-256	`d7312cb1d642ff79700c5fb4932a7792edbf5f125376ac9f6eedb98f7e4c2710`

See more details on using hashes here.

pydca 1.23

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About `pydca`

Prerequisites

Installing

Using `pydca` as a Python Library

Running `pydca` From Command Line

Trimming MSA data

DCA Computation

Using `pydca`'s Pseudolikelihood Maximization Algorithm

Using `pydca`'s Mean-Field Algorithm

Contact Map Visualization

Plotting True Positive Rate

References

If you use pydca for your work please cite the following references

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

pydca 1.23

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About pydca

Prerequisites

Installing

Using pydca as a Python Library

Running pydca From Command Line

Trimming MSA data

DCA Computation

Using pydca's Pseudolikelihood Maximization Algorithm

Using pydca's Mean-Field Algorithm

Contact Map Visualization

Plotting True Positive Rate

References

If you use pydca for your work please cite the following references

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

About `pydca`

Using `pydca` as a Python Library

Running `pydca` From Command Line

Using `pydca`'s Pseudolikelihood Maximization Algorithm

Using `pydca`'s Mean-Field Algorithm