Predicting protein-protein interactions and structures from multiple sequence alignments.

These details have not been verified by PyPI

Project links

Project description

🍐 yunta

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

Predicting pairwise protein-protein interactions and structures from multiple sequence alignments. Now with interspecies (host-pathogen) interactions and automatic chunking of large sequences!

Installation
Credit
Command-line usage
Python API
Scaling up
Issues, problems, suggestions
Further help

yunta provides several implementations of protein-protein interaction evaluation. In increasing computational cost:

GPU-accelerated direct coupling analysis (DCA) in PyTorch
RoseTTAFold-2track via the rf2t-micro package
AlphaFold2 for protein-protein structure prediction

yunta has streamlined installation, a command-line interface, a Python API, and resilience to GPU out-of-memory errors through chunking of long sequences and CPU-fallback. It takes as input unpaired multiple-sequence alignments in A3M format (as generated by tools like hhblits), and outputs a matrix of inter-residue contacts.

Rough timings for a pair of ~200 amino-acid proteins (S. cerevisiae DHFR and WW domain-containing protein) on CPU:

DCA: 5 seconds

RosettaFold-2track: 10 seconds

AlphaFold2: 1 hour

Note that times increase quadratically with total protein length.

Installation

pip install yunta

To enable AlphaFold2 with CUDA 12 (recommended for GPU):

pip install yunta[af_cuda12]

For a local CUDA 12 installation:

pip install yunta[af_cuda12_local]

For CUDA 11:

pip install yunta[af_cuda11]

For AlphaFold2 without a specific CUDA version (CPU or custom JAX install):

pip install yunta[af]

To enable RosettaFold-2track:

pip install yunta[rf2t]

Using the embedded models requires the RoseTTAFold-2track and AlphaFold2 weights. These are automatically downloaded on first use. By doing so you agree that the trained weights for RoseTTAFold are made available for non-commercial use only under the terms of the Rosetta-DL Software license and AlphaFold2's pretrained parameters fall under the CC BY 4.0 license.

Environment variables

Variable	Default	Description
`YUNTA_CACHE`	`~/.cache/yunta`	Directory for the organism interaction lookup table cache.
`YUNTA_USE_CACHE`	`False`	Set to `True` to load a pre-built cache from disk rather than rebuilding.
`YUNTA_TEST`	`0`	Set to `1` to build and hold the interaction lookup table in memory only (no disk write).

Credit

yunta is a fork of SpeedPPI, which is itself inspired by FoldDock. This method used AlphaFold2 to evaluate 65,484 protein-protein interactions from the human proteome in Towards a structurally resolved human protein interaction network.

The idea of using DCA, RoseTTAFold-2track, and AlphaFold2 in a cascade of increasingly expensive and specific PPI detection methods has been explored in a series of papers from David Baker's lab:

yunta puts these algorithms in one place with easy installation, a command-line interface, and a Python API. It also enables interspecies co-evolutionary analysis using a built-in host-pathogen interaction mapping.

Command-line usage

$ yunta --help
usage: yunta [-h] {dca-single,dca-many,rf2t-single,af2-single,af2-many} ...

Screening protein-protein interactions using DCA, RosettaFold-2track, and AlphaFold2.

options:
  -h, --help            show this help message and exit

Sub-commands:
  {dca-single,dca-many,rf2t-single,af2-single,af2-many}
                        Use these commands to specify the tool you want to use.
    dca-single          Calculate DCA for one protein-protein interaction.
    dca-many            Calculate DCA between two sets of proteins, or all pairs in one set of proteins.
    rf2t-single         Calculate RF-2track contacts between one protein and a series of others.
    af2-single          Model one protein-protein interaction.
    af2-many            Model all interactions between two sets of proteins, or all pairs in one set of proteins.

Generating multiple-sequence alignments

All algorithms depend on pre-computed multiple-sequence alignments (MSAs) between a protein of interest and as many homologs as possible. You can generate MSAs using hhblits with pre-clustered databases like UniClust:

hhblits -e 0.01 -v 3 -d /path/to/UniClust-database -i input.fasta -oa3m output-msa.a3m -o /dev/null -cov 60 -n 3 -realign -realign_max 10000

This typically takes 1–40 min depending on query complexity. See the hhsuite documentation for details.

Calculating contact maps

Given two MSAs, yunta calculates a contact map using DCA, RF2t, or AlphaFold2, and produces a summary table for each pair.

Using DCA or RF2t produces a table like this:

$ yunta dca-single test/inputs/DYR_YEAST.a3m -2 test/inputs/CAPZA_YEAST.a3m -o test/outputs/dca-single.tsv --apc

ID	uniprot_id_1	uniprot_id_2	seq_len	chain_a_len	chain_b_len	msa1_depth	msa2_depth	msa_depth	n_eff	DCA:apc	DCA:mean	DCA:median	DCA:maximum	DCA:minimum	DCA:var	DCA:sigma1	DCA:focality	DCA:top_A	DCA:top_B
O13297-D6VTK4	O13297	D6VTK4	980	549	431	14246	1546	670	2	False	0.0183	0.0147	0.0743	2.28e-06	...	...	...	...	...

Method-specific columns are prefixed with DCA: or RF2t:. Common columns across all methods:

sigma1 — leading singular value of the inter-chain contact submatrix
focality — ratio of first to second singular value; higher values indicate a more concentrated interaction signal
top_A, top_B — indices of the top-scoring residues in each chain (from the leading SVD eigenvector)

If you also give --plot, contact maps for the full complex and inter-chain contacts only are saved as PNG, alongside CSV files of the raw matrices:

$ yunta dca-single test/inputs/DYR_YEAST.a3m -2 test/inputs/CAPZA_YEAST.a3m -o test/outputs/dca-single.tsv --apc --plot test/outputs/DYR_YEAST-CAPZA_YEAST

Predicting protein complex structures

yunta can feed MSAs into AlphaFold2 to predict binary protein complex structures:

$ yunta af2-single test/inputs/DYR_YEAST.a3m -2 test/inputs/CAPZA_YEAST.a3m -o test/outputs/af2-single.tsv

This writes a summary TSV with AF2:-prefixed metrics — n_contacts, mean_interface_plddt, pdockq, seed — in addition to the standard contact map statistics. PDB structure files are written to the current working directory, named by protein pair ID.

Using --plot generates contact map plots as with the other commands:

$ yunta af2-single test/inputs/DYR_YEAST.a3m -2 test/inputs/CAPZA_YEAST.a3m -o test/outputs/af2-single.tsv --plot test/outputs/af2-single-plot

Command-line tools

*-single commands run one protein against one or more others:

$ yunta dca-single --help
usage: yunta dca-single [-h] [--msa2 [MSA2 ...]] [--list-file] [--interspecies] [--strict-match]
                        [--output [OUTPUT]] [--plot PLOT] [--apc] [msa1]

positional arguments:
  msa1                  MSA file. Default: STDIN.

options:
  -h, --help            show this help message and exit
  --msa2 [MSA2 ...], -2 [MSA2 ...]
                        Second MSA file(s). Default: if not provided, all pairwise from msa1.
  --list-file, -l       Treat inputs as plain-text list of MSA files, rather than MSA filenames.
                        Default: treat as MSA filenames.
  --interspecies, -i    MSAs are from different species; enables built-in host-pathogen interaction
                        map. Default: assume same species.
  --strict-match, -S    For interspecies mode, require query MSA sequences to be from known
                        interacting species. Default: relax this constraint for query sequences.
  --output [OUTPUT], -o [OUTPUT]
                        Output filename. Default: STDOUT.
  --plot PLOT, -p PLOT  Directory for saving plots. Default: don't plot.
  --apc, -a             Apply average product correction (APC) to DCA scores. Default: off.

If one MSA is provided (no -2), homodimeric interactions are probed. Use --list-file to pass a single plain-text file containing one MSA path per line.

*-many commands run all pairwise combinations across two sets:

$ yunta af2-many --help
usage: yunta af2-many [-h] [--msa2 [MSA2 ...]] [--list-file] [--interspecies] [--strict-match]
                      [--output [OUTPUT]] [--params PARAMS] [--recycles RECYCLES] [--plot PLOT]
                      [msa1 ...]

positional arguments:
  msa1                  MSA file(s).

options:
  -h, --help            show this help message and exit
  --msa2 [MSA2 ...], -2 [MSA2 ...]
                        Second MSA file(s). Default: if not provided, all pairwise from msa1.
  --list-file, -l       Treat inputs as plain-text list of MSA files, rather than MSA filenames.
  --interspecies, -i    MSAs are from different species; enables built-in host-pathogen interaction map.
  --strict-match, -S    For interspecies mode, require query MSA sequences to be from known
                        interacting species.
  --output [OUTPUT], -o [OUTPUT]
                        Output filename. Default: STDOUT.
  --params PARAMS, -w PARAMS
                        Path to AlphaFold2 params file (.npz). Downloaded automatically if absent.
  --recycles RECYCLES, -x RECYCLES
                        Maximum number of recycles through the model. Default: 10.
  --plot PLOT, -p PLOT  Directory for saving plots. Default: don't plot.

Interspecies (host-pathogen) analysis

Use --interspecies / -i when the two MSAs come from organisms that interact as host and pathogen. yunta uses a built-in host-pathogen interaction (HPI) map to pair aligned sequences across species rather than requiring exact species identity:

$ yunta dca-single test/inputs/crypto/Q5CPK5_CRYPI.a3m \
    -2 test/inputs/human/EZRI_HUMAN.a3m \
    --interspecies --apc \
    -o test/outputs/dca-single-interspecies.tsv

By default (without --strict-match), the HPI constraint is relaxed for the query sequences themselves — useful when screening an uncharacterised query against a known host or pathogen proteome. Add --strict-match to require that query sequences come from species in the HPI map.

Python API

Load and inspect an MSA:

from yunta.structs.msa import MSA, PairedMSA

msa = MSA.from_file("my-msa-file.a3m")
print(msa)         # MSA(name=P07807) of sequence length 549, with 14246 sequences.
print(msa.neff())  # effective sequence count

Pair two MSAs and run DCA:

from yunta.structs.msa import MSA, PairedMSA
from yunta.interactions.dca.dca_torch import calculate_dca

msa1 = MSA.from_file("protein-a.a3m")
msa2 = MSA.from_file("protein-b.a3m")
paired = PairedMSA.from_msa(msa1, msa2)
contact_matrix = calculate_dca(paired, apc=True)

For interspecies pairing, pass interaction_map="builtin":

paired = PairedMSA.from_msa(msa1, msa2, interaction_map="builtin")

Or supply a custom dict mapping species IDs to lists of interacting species IDs:

paired = PairedMSA.from_msa(
    msa1, msa2,
    interaction_map={"NCBI:562": ["NCBI:10710"], "NCBI:10710": ["NCBI:562"]},
)

Run the full screening pipeline programmatically:

from yunta.screening import dca_one_vs_many, rf2track_one_vs_many

outputs = dca_one_vs_many(
    msa_file1="query.a3m",
    msa_file2=["target1.a3m", "target2.a3m"],
    apc=True,
    interaction_map="builtin",  # omit for same-species
)
for result_matrix, interaction_matrix, metrics in outputs:
    print(metrics.ID, metrics.focality)

Each element of outputs is a 3-tuple (full_contact_matrix, inter-chain_contact_matrix, metrics_dataclass). Metrics dataclasses (DCAMetrics, RF2TMetrics, AF2Metrics) can be written directly to TSV:

metrics.write("results.tsv")

(More documentation coming soon!)

... if you want to scale up

While the *-many commands handle batches of PPIs, for large-scale screening across a HPC cluster our nf-ggi Nextflow pipeline is more efficient and can also generate MSAs for you.

Issues, problems, suggestions

Add to the issue tracker.

Further help

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.7

May 23, 2026

0.1.6

May 22, 2026

0.1.5

May 22, 2026

0.1.4

May 21, 2026

0.1.3

May 21, 2026

This version

0.1.2

May 21, 2026

0.1.1

May 15, 2026

0.1.0

May 14, 2026

0.0.7

May 12, 2026

0.0.6.post2

Sep 2, 2025

0.0.6.post1

Aug 27, 2025

0.0.6

Aug 21, 2025

0.0.5

Aug 7, 2025

0.0.4

Jul 22, 2025

0.0.3

Apr 24, 2025

0.0.2

Apr 10, 2025

0.0.1.post1

Sep 22, 2024

0.0.1

Sep 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yunta-0.1.2.tar.gz (10.8 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yunta-0.1.2-py3-none-any.whl (11.1 MB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file yunta-0.1.2.tar.gz.

File metadata

Download URL: yunta-0.1.2.tar.gz
Upload date: May 21, 2026
Size: 10.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for yunta-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`505efcd643a942583c83b592578d24f83657f9fcc3cb929d288ac762fbc89235`
MD5	`29f3bc8d89830814da3597b64ad3d8a4`
BLAKE2b-256	`8de530978559de75192849626df52a7bdf7edf6d66b2c9e4d38e6bd959b83f5f`

See more details on using hashes here.

File details

Details for the file yunta-0.1.2-py3-none-any.whl.

File metadata

Download URL: yunta-0.1.2-py3-none-any.whl
Upload date: May 21, 2026
Size: 11.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for yunta-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c43ffe7ae490ddfb0163022c20bd9b49537b828a3462f0eccd223e6b2263ca9d`
MD5	`388fa0fe42a38f36caf5a6708832eb14`
BLAKE2b-256	`cac78f1fb832bf5daff257e956259580afe8e7e36e940379776a37cd62e69c43`

See more details on using hashes here.

yunta 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🍐 yunta

Installation

Environment variables

Credit

Command-line usage

Generating multiple-sequence alignments

Calculating contact maps

Predicting protein complex structures

Command-line tools

Interspecies (host-pathogen) analysis

Python API

... if you want to scale up

Issues, problems, suggestions

Further help

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes