MSA Pairformer model repository

Project description

MSA Pairformer

MSA Pairformer

This repository contains the latest release of MSA Pairformer and Google Colab notebooks for relevant analyses. Here, you will find how to use MSA Pairformer to embed protein sequences, predict residue-residue interactions in monomers and at the interface of protein-protein interactions, and perform zero-shot variant effect prediction.

Installation

To get started with MSA Pairformer, install the python library using pip:

pip install msa-pairformer

or download this Github repository and install manually

git clone git@github.com:yoakiyama/MSA_Pairformer.git
pip install -e .

MSA Pairformer

MSA Pairformer extracts evolutionary signals most relevant to a query sequence from a set of aligned homologous sequences. Using only 111M parameters, it can easily run on consumer-grade hardware (e.g. NVIDIA RTX 4090) and achieve state-of-the-art performance. In this repository, we provide training code and Google Colab notebooks to reproduce the results in the pre-print. We are excited to deliver this tool to the research community and to see all of its applications to real-world biological challenges.

Getting started with MSA Pairformer

The model's weights can be downloaded from Huggingface under HuggingFace/yakiyama/MSA-Pairformer.

from huggingface_hub import login
from MSAPairformer.model import MSAPairformer

# Use the GPU if available
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {torch.cuda.get_device_name(device)}")

# This function will allow you to login to huggingface via an API key
login()

# Download model weights and load model
# As long as the cache doesn't get cleared, you won't need to re-download the weights whenever you re-run this
model = MSAPairformer.from_pretrained(device=device)

# You can also save the downloaded weights to a specified directory in your filesystem.
# Saving the model weights like so will allow you to load the model without re-downloading if your cache gets cleared.
# Once you run this code once, you can re-run and it will automatically load the weights
save_model_dir = "model_weights"
model = MSAPairformer.from_pretrained(weights_dir=save_model_dir, device=device)

# Subsample MSA using hhfilter and greedy diversification
np.random.seed(42)
msa_obj = MSA(
    msa_file_path=msa_file,
    max_seqs=max_msa_depth,
    max_length=total_length,
    max_tokens=1e12,
    diverse_select_method="hhfilter",
    hhfilter_kwargs={"binary": "hhfilter"}
)
msa_tokenized_t = msa_obj.diverse_tokenized_msa
  msa_onehot_t = torch.nn.functional.one_hot(msa_tokenized_t, num_classes=len(aa2tok_d)).unsqueeze(0).float().to(device)
  mask, msa_mask, full_mask, pairwise_mask = prepare_msa_masks(msa_obj.diverse_tokenized_msa.unsqueeze(0))
  mask, msa_mask, full_mask, pairwise_mask = mask.to(device), msa_mask.to(device), full_mask.to(device), pairwise_mask.to(device)
  
# Predict contacts and embed query sequence
results_dict = model.get_embeddings_and_contacts()
with torch.no_grad():
  with torch.amp.autocast(dtype=torch.bfloat16, device_type="cuda"):
      res = global_model(  # Use the pre-loaded global model
          msa=msa_onehot_t.to(torch.bfloat16),
          mask=mask,
          msa_mask=msa_mask,
          full_mask=full_mask,
          pairwise_mask=pairwise_mask,
          complex_chain_break_indices=[breaks],
          return_seq_weights=True,
          return_pairwise_repr_layer_idx=None,
          return_msa_repr_layer_idx=None
      )

  results.keys()
  # res is a dictionary with the following keys: final_msa_repr, final_pairwise_repr, msa_repr_d, pairwise_repr_d, seq_weights_list_d, logits, contacts, total_length, max_msa_depth, weight_scale

That's it -- you've generated embeddings and predicted contacts using MSA Pairformer!

Licenses

MSA Pairformer code and model weights are released under a permissive, slightly modified ☕️ MIT license. It can be freely used for both academic and commercial purposes.

Citation

If you use MSA Pairformer in your work, please use the following citation

@article {Akiyama2025.08.02.668173,
	author = {Akiyama, Yo and Zhang, Zhidian and Mirdita, Milot and Steinegger, Martin and Ovchinnikov, Sergey},
	title = {Scaling down protein language modeling with MSA Pairformer},
	elocation-id = {2025.08.02.668173},
	year = {2025},
	doi = {10.1101/2025.08.02.668173},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/08/03/2025.08.02.668173},
	eprint = {https://www.biorxiv.org/content/early/2025/08/03/2025.08.02.668173.full.pdf},
	journal = {bioRxiv}
}

Project details

Release history Release notifications | RSS feed

1.0.1

Aug 27, 2025

This version

1.0.0

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msa_pairformer-1.0.0.tar.gz (357.9 kB view details)

Uploaded Aug 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

msa_pairformer-1.0.0-py3-none-any.whl (50.4 kB view details)

Uploaded Aug 4, 2025 Python 3

File details

Details for the file msa_pairformer-1.0.0.tar.gz.

File metadata

Download URL: msa_pairformer-1.0.0.tar.gz
Upload date: Aug 4, 2025
Size: 357.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for msa_pairformer-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ace8ffcee66e4b684426aef008b1dc5a77218dd07f83942eee98fee5c8d54ab3`
MD5	`72057aa49461dfb11eba186c2bb85c0b`
BLAKE2b-256	`107d902646f633e803c46aefff86a891cc345df585fd6d934a3cab456237cc7d`

See more details on using hashes here.

File details

Details for the file msa_pairformer-1.0.0-py3-none-any.whl.

File metadata

Download URL: msa_pairformer-1.0.0-py3-none-any.whl
Upload date: Aug 4, 2025
Size: 50.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for msa_pairformer-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2e0ce71310d9998d0d9b2472a1f93391475e2e86720eaf7ed375479a44cc2d9`
MD5	`31640d5397076a458de66901ffefabe1`
BLAKE2b-256	`bd4a8071a63785604ff3f84f32cbe3a0995b7ee5264fa67deadbf9c0c68c0b44`

See more details on using hashes here.

msa-pairformer 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MSA Pairformer

Installation

MSA Pairformer

Getting started with MSA Pairformer

Licenses

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes