Structure-Informed Direct Couplings Analysis.
Project description
StructureDCA
The structuredca Python package implements Structure-Informed Direct Coupling Analysis (StructureDCA) to predict the effects of missense mutations on proteins.
Standard DCA methods use Multiple Sequence Alignments (MSAs) to build a statistical evolutionary model of homologous protein families. They rely on single-site fields h and pairwise couplings J that capture co-evolution between residue positions.
StructureDCA extends this framework by incorporating the residue–residue contact map derived from the protein 3D structure to infer a sparse DCA model, in which couplings between spatially distant residue pairs are removed.
This approach leverages the observation that functionally relevant, co-evolving residues are most often structurally in contact.
The package includes a pseudolikelihood-maximization DCA solver capable of inferring sparse DCA models, where selected coupling coefficients Jij are constrained to zero.
StructureDCA combines a flexible, user-friendly Python interface with the high computational efficiency of its C++ backend.
This model was initially developed to improve classical DCA methods for predicting the effects of missense mutations in proteins. However, StructureDCA can be applied to any DCA-based analysis (except for contact predictions...) and provides the full functionality of both standard and sparse DCA models.
Please cite:
Installation and Usage
Colab Notebook
You can instantly try StructureDCA in this Colab Notebook. This notebook acts as a user-friendly web server / graphical interface, offering helpers to automatically fetch or generate the MSA and 3D structure for your target protein. You can then visualize your mutational landscape predictions as a DMS heatmap or mapped to the 3D structure.
Installation
Installation with pip:
pip install structuredca
CLI usage
Use StructureDCA with a Command Line Interface (CLI).
For example, from the directory ./test_data/, run:
structuredca ./6acv_A_29-94.fasta ./6acv_A_29-94.pdb A -o ./6acv_A_29-94_structuredca.csv
To show CLI usage and optional arguments, run:
structuredca --help
Python usage
Make sure the first sequence in your MSA file is the target sequence to mutate (otherwise have a look at tutorial 1).
From directory ./test_data/ execute the following Python code:
# Import
from structuredca import StructureDCA
# Log basic usage and arguments
StructureDCA.help()
# Initialize StructureDCA model
sdca = StructureDCA(
msa_path='./6acv_A_29-94.fasta',
pdb_path='./6acv_A_29-94.pdb', chains='A',
use_contacts_plddt_filter=False, # use only if 3D structure is an AlphaFold model (or similar) to remove low pLDDT regions from contacts
)
# Evaluate the evolutionary energy difference (ΔE) of mutations
# scores can be reweighted by Relative Solvent Accessibility-complement (RSAc) -> advised to predict stability changes (ΔΔG)
# * dE = 0 means neutral mutation
# * dE >> 0 means destabilizing / deleterious mutation
dE_mut1 = sdca.eval_mutation('K13H', reweight_by_rsa=True)
dE_mut2 = sdca.eval_mutation('K13H:K12G', reweight_by_rsa=True)
# Evaluate ΔE of all single mutations and save results to a file
dE_all = sdca.eval_mutations_table(
save_path='./6acv_A_29-94_structuredca.csv',
log_output_sample=True,
)
# Evaluate absolute evolutionary energy (E) of a sequence
seq_to_evaluate = 'A' * sdca.msa_length # arbitrary example: AAAAA...
E_only_alanine = sdca.eval_sequence(seq_to_evaluate, reweight_by_rsa=True)
# Evaluate relative probabilities for the 20 Amino Acids at this position given a background sequence
fasta_position = 10 # as in FASTA index system (starts at 1)
array_position = fasta_position - 1 # As in a Python array (starts at 0)
amino_acid_probabilities = sdca.position_probabilities(array_position) # P(a) = e^{-dE(wt→a)} / ∑_b e^{-dE(wt→b)}
Tutorials and Advanced Usage
In the ./tutorials/ directory, we provide a series of Jupyter notebooks that illustrate different ways to using StructureDCA:
-
Basics and arguments (
1_sdca-basics.ipynb): basics, evaluating effects of mutations with StructureDCA, using optional arguments (likedistance_cutofforlambda_h/lambda_J), evaluate mutations with an alternative background sequence. -
Access properties (
2_sdca-properties.ipynb): access StructureDCA coefficients and properties (like fieldsh, couplingsJ, Frobenius norms, residue-residue distance matrix, contact map, ...). -
Standard DCA (
3_sdca-standard-dca.ipynb): solve standard (fully connected) DCA models and run without protein 3D structure. -
Protein–Protein Interactions (
4_sdca-ppis.ipynb): working with protein–protein interactions (PPIs).
Compute RSA from the biologically relevant conformation, include inter-chain contacts arising from homomers, and build a StructureDCA model from a concatenated MSA of a heteromer PPI of highly coevoling proteins. -
Custom contacts (
5_sdca-custom-contacts.ipynb): build a StructureDCA model with custom contact map (instead of the default distance criteria) and custom weights for StructureDCA[RSA] (instead of default RSA-based weights) to derive any possible sparse DCA model.
Build and Installation Notes
Requirements
- Python 3.9 or later
- Python packages
numpyandbiopython(version 1.75 or later) - A C++ compiler that supports C++17 (such as GCC, LLVM or MSVC).
Credits
- For inferring the DCA coefficients, StructureDCA uses a gradient descent solver: L-BFGS by Naoaki Okazaki (which is included in this repo).
- The part of the code that makes the bridge between Python and C++ is inspired from the plmDCA implementation 'pycofitness' by Mehari B. Zerihun, Fabrizio Pucci.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file structuredca-1.0.1.tar.gz.
File metadata
- Download URL: structuredca-1.0.1.tar.gz
- Upload date:
- Size: 91.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e163719fe754e355ab22c63862277ba39d9beb7b721372a581ddd10143e7cb6
|
|
| MD5 |
561ad627e9c46c972554864afc101ea8
|
|
| BLAKE2b-256 |
7eaa471e7689b77f00ffe5ec7f47977266704d21a547f3c3a870ed13ed4ecba4
|