Structure-Informed Direct Couplings Analysis.

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language
- C
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

StructureDCA

The structuredca Python package implements Structure-Informed Direct Coupling Analysis (StructureDCA) to predict the effects of missense mutations on proteins.

Standard DCA methods use Multiple Sequence Alignments (MSAs) to build a statistical evolutionary model of homologous protein families. They rely on single-site fields h and pairwise couplings J that capture co-evolution between residue positions. StructureDCA extends this framework by incorporating the residue–residue contact map derived from the protein 3D structure to infer a sparse DCA model, in which couplings between spatially distant residue pairs are removed. This approach leverages the observation that functionally relevant, co-evolving residues are most often structurally in contact.

The package includes a pseudolikelihood-maximization DCA solver capable of inferring sparse DCA models, where selected coupling coefficients Jij are constrained to zero. StructureDCA combines a flexible, user-friendly Python interface with the high computational efficiency of its C++ backend. This model was initially developed to improve classical DCA methods for predicting the effects of missense mutations in proteins. However, StructureDCA can be applied to any DCA-based analysis (except for contact predictions...) and provides the full functionality of both standard and sparse DCA models.

Please cite:

Matsvei Tsishyn, Hugo Talibart, Marianne Rooman, Fabrizio Pucci. Structure-informed direct coupling analysis improves protein mutational landscape predictions. BioRxiv.

Installation and Usage

Colab Notebook

You can instantly try StructureDCA in this Colab Notebook. This notebook acts as a user-friendly web server / graphical interface, offering helpers to automatically fetch or generate the MSA and 3D structure for your target protein. You can then visualize your mutational landscape predictions as a DMS heatmap or mapped to the 3D structure.

Installation

Installation with pip:

pip install structuredca

CLI usage

Use StructureDCA with a Command Line Interface (CLI). For example, from the directory ./test_data/, run:

structuredca ./6acv_A_29-94.fasta ./6acv_A_29-94.pdb A -o ./6acv_A_29-94_structuredca.csv

To show CLI usage and optional arguments, run:

structuredca --help

Python usage

Make sure the first sequence in your MSA file is the target sequence to mutate (otherwise have a look at tutorial 1).
From directory ./test_data/ execute the following Python code:

# Import
from structuredca import StructureDCA

# Log basic usage and arguments
StructureDCA.help()

# Initialize StructureDCA model
sdca = StructureDCA(
    msa_path='./6acv_A_29-94.fasta',
    pdb_path='./6acv_A_29-94.pdb', chains='A',
    use_contacts_plddt_filter=False, # use only if 3D structure is an AlphaFold model (or similar) to remove low pLDDT regions from contacts
)

# Evaluate the evolutionary energy difference (ΔE) of mutations
# scores can be reweighted by Relative Solvent Accessibility-complement (RSAc) -> advised to predict stability changes (ΔΔG)
# * dE = 0 means neutral mutation
# * dE >> 0 means destabilizing / deleterious mutation
dE_mut1 = sdca.eval_mutation('K13H', reweight_by_rsa=True)
dE_mut2 = sdca.eval_mutation('K13H:K12G', reweight_by_rsa=True)

# Evaluate ΔE of all single mutations and save results to a file
dE_all = sdca.eval_mutations_table(
    save_path='./6acv_A_29-94_structuredca.csv',
    log_output_sample=True,
)

# Evaluate absolute evolutionary energy (E) of a sequence
seq_to_evaluate = 'A' * sdca.msa_length # arbitrary example: AAAAA...
E_only_alanine = sdca.eval_sequence(seq_to_evaluate, reweight_by_rsa=True)

# Evaluate relative probabilities for the 20 Amino Acids at this position given a background sequence 
fasta_position = 10 # as in FASTA index system (starts at 1)
array_position = fasta_position - 1 # As in a Python array (starts at 0)
amino_acid_probabilities = sdca.position_probabilities(array_position) # P(a) = e^{-dE(wt→a)} / ∑_b e^{-dE(wt→b)}

Tutorials and Advanced Usage

In the ./tutorials/ directory, we provide a series of Jupyter notebooks that illustrate different ways to using StructureDCA:

Basics and arguments (1_sdca-basics.ipynb): basics, evaluating effects of mutations with StructureDCA, using optional arguments (like distance_cutoff or lambda_h / lambda_J), evaluate mutations with an alternative background sequence.
Access properties (2_sdca-properties.ipynb): access StructureDCA coefficients and properties (like fields h, couplings J, Frobenius norms, residue-residue distance matrix, contact map, ...).
Standard DCA (3_sdca-standard-dca.ipynb): solve standard (fully connected) DCA models and run without protein 3D structure.
Protein–Protein Interactions (4_sdca-ppis.ipynb): working with protein–protein interactions (PPIs).
Compute RSA from the biologically relevant conformation, include inter-chain contacts arising from homomers, and build a StructureDCA model from a concatenated MSA of a heteromer PPI of highly coevoling proteins.
Custom contacts (5_sdca-custom-contacts.ipynb): build a StructureDCA model with custom contact map (instead of the default distance criteria) and custom weights for StructureDCA[RSA] (instead of default RSA-based weights) to derive any possible sparse DCA model.

Build and Installation Notes

Requirements

Python 3.9 or later
Python packages numpy and biopython (version 1.75 or later)
A C++ compiler that supports C++17 (such as GCC, LLVM or MSVC).

Credits

For inferring the DCA coefficients, StructureDCA uses a gradient descent solver: L-BFGS by Naoaki Okazaki (which is included in this repo).
The part of the code that makes the bridge between Python and C++ is inspired from the plmDCA implementation 'pycofitness' by Mehari B. Zerihun, Fabrizio Pucci.

Project details

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language
- C
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

1.0.1

Mar 18, 2026

1.0.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structuredca-1.0.1.tar.gz (91.5 kB view details)

Uploaded Mar 18, 2026 Source

File details

Details for the file structuredca-1.0.1.tar.gz.

File metadata

Download URL: structuredca-1.0.1.tar.gz
Upload date: Mar 18, 2026
Size: 91.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for structuredca-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2e163719fe754e355ab22c63862277ba39d9beb7b721372a581ddd10143e7cb6`
MD5	`561ad627e9c46c972554864afc101ea8`
BLAKE2b-256	`7eaa471e7689b77f00ffe5ec7f47977266704d21a547f3c3a870ed13ed4ecba4`

See more details on using hashes here.

structuredca 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

StructureDCA

Installation and Usage

Colab Notebook

Installation

CLI usage

Python usage

Tutorials and Advanced Usage

Build and Installation Notes

Requirements

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes