Skip to main content

The pLM entropy Python package

Project description

pLM entropy

Code and data for: Lytras, S., Strange, A., Ito, J., and Sato K. (2025). Inferring context-specific site variation with evotuned protein language models. BioRxiv

pLM entropy is protein language model (pLM)-based metric to assess protein site conservation and variability. We test this metric using versions of two popular pLMs (ESM-2 and protT5) fine-tuned on the diversity of different Influenza A virus serotype hemagglutinin (HA) proteins.


The plm_entropy Python package

Installation

The python package for embedding sequences though ESM-2 and protT5 based models and calculating pLM entropy, along with a number of helper functions is available to install through pip. We further provide a conda environment yml file for easily installing all compatibilities.

To install the conda environment with the python package:

git clone https://github.com/spyros-lytras/plm_entropy.git

cd plm_entropy

conda env create -f plm_entropy_env.yml

conda activate plm_entropy

pip install plm_entropy

Alternatively, the plm_entropy package can also be installed via pip from PyPI.


Usage example

Find a detailed usage example in the following Jupyter notebook.


Functions

embed_entropy_esm2(modnam, inputf, outf)

options:

  • modnam: path to the pLM weights (ESM-2 or fine-tuned version only)
  • inputf:
    • .csv file with at least one node column containing sequence identifiers and one seq column containing the amino acid sequences, or
    • fasta file with amino acid sequences (accepted extensions: .fa, .fas, .fasta)
  • outf: output files name (without file extension)
  • (optional) save_pickle: (default = False)
  • (optional) save_logit: (default = False)
  • (optional) torch_device: (default = cuda:0)

embed_entropy_protT5(modnam, inputf, outf)

options:

  • modnam: path to the pLM weights (protT5MLM or fine-tuned version only)
  • inputf:
    • .csv file with at least one node column containing sequence identifiers and one seq column containing the amino acid sequences, or
    • fasta file with amino acid sequences (accepted extensions: .fa, .fas, .fasta)
  • outf: output files name (without file extension)
  • (optional) save_pickle: (default = False)
  • (optional) save_logit: (default = False)
  • (optional) torch_device: (default = cuda:0)

aln_plm_entropy(entrdf_f, asralf)

Transforms the csv file with pLM entropy values outputted from the embed_entropy_esm2() or embed_entropy_protT5() functions so that values correspond to the alignment positions of the aligned input amino acid sequences.

options:

  • entrdf_f: -site_entropy.csv output file containing per site pLM entropy values, calculated with the embed_entropy_esm2() or embed_entropy_protT5() functions
  • asralf: protein sequence alignment of the sequences for which pLM entropy values were calculated in fasta format

calc_aln_entropy(alfile)

options:

  • alfile:
  • (optional) exclude_internal_nodes:

al_mod_entr_correl(alentropy, modentr_f)

options:

  • alfile:
  • modentr_f:
  • (optional) exclude_internal_nodes:

pLM entropy applied to Influenza A virus HA evotuned pLMs

See Lytras, S., Strange, A., Ito, J., and Sato K. (2025). Inferring context-specific site variation with evotuned protein language models. BioRxiv for details.

Evotuned pLMs

The weights for the ESM-2 and protT5 -based evotuned pLMs presented in the manuscript above can be downloaded from the zenodo repository below:

DOI


Data & code available in this repository

  • Evotuning

  • Inference

    • Jupyter notebook for inferring pLM entropy values using the evotuned models with the plm_entropy Python package.

    • Example data for the inference.

  • Testing data

    • IAV HA sequences and phylogenies used in the manuscript.
    • Note that protein sequences retrieved from the GISAID database have been omitted from the repository.
  • Additional tests

    • Jupyter notebook for predicting which protein sites will mutate in a given sequence context using pLM entropy.

    • Code for testing influence of pretrained ESM-2 by resetting weights before evotuning

    • Data and code for H3N2 tree backbone aBSREL branch-specific selection analysis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plm_entropy-0.2.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plm_entropy-0.2-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file plm_entropy-0.2.tar.gz.

File metadata

  • Download URL: plm_entropy-0.2.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plm_entropy-0.2.tar.gz
Algorithm Hash digest
SHA256 c7e75ad8ab4ec7a2febced3a06d7a60e55c3dca399e6bdcc56f34f6db006fd2c
MD5 60ec65447c82361045579f58f61953bf
BLAKE2b-256 62e0ab08a4dab0e48a6804625216e3c54c7faa98ca022c669ed25313ed079b88

See more details on using hashes here.

Provenance

The following attestation bundles were made for plm_entropy-0.2.tar.gz:

Publisher: publish.yml on spyros-lytras/plm_entropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file plm_entropy-0.2-py3-none-any.whl.

File metadata

  • Download URL: plm_entropy-0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plm_entropy-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1e29640ab6734a0165838e5228578f859ac83797afe3be8cd674ddda8b7dcaf2
MD5 3de49a5d0fb179e89d18f58ef7ce938b
BLAKE2b-256 ccb7b40fbabfb3c194c0976fcd377b6206ce1183c2d14a3e032257aa91e06c60

See more details on using hashes here.

Provenance

The following attestation bundles were made for plm_entropy-0.2-py3-none-any.whl:

Publisher: publish.yml on spyros-lytras/plm_entropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page