Skip to main content

The pLM entropy Python package

Project description

pLM entropy

Code and data for: Lytras, S., Strange, A., Ito, J., and Sato K. (2025). Inferring context-specific site variation with evotuned protein language models. BioRxiv

pLM entropy is protein language model (pLM)-based metric to assess protein site conservation and variability. We test this metric using versions of two popular pLMs (ESM-2 and protT5) fine-tuned on the diversity of different Influenza A virus serotype hemagglutinin (HA) proteins.


The plm_entropy Python package

Installation

The python package for embedding sequences though ESM-2 and protT5 based models and calculating pLM entropy, along with a number of helper functions is available to install through pip. We further provide a conda environment yml file for easily installing all compatibilities.

To install the conda environment with the python package:

git clone

cd plm_entropy

conda env create -f pLM_entropy.yml

pip install update plm_entropy

Alternatively, the plm_entropy package can also be installed via pip from PyPI.


Usage example

Find a detailed usage example in the following Jupyter notebook.


Functions

embed_entropy_esm2(modnam, inputf, outf)

options:

  • modnam: path to the pLM weights (ESM-2 or fine-tuned version only)
  • inputf:
    • .csv file with at least one node column containing sequence identifiers and one seq column containing the amino acid sequences, or
    • fasta file with amino acid sequences (accepted extensions: .fa, .fas, .fasta)
  • outf: output files name (without file extension)
  • (optional) save_pickle: (default = False)
  • (optional) save_logit: (default = False)
  • (optional) torch_device: (default = cuda:0)

embed_entropy_protT5(modnam, inputf, outf)

options:

  • modnam: path to the pLM weights (protT5MLM or fine-tuned version only)
  • inputf:
    • .csv file with at least one node column containing sequence identifiers and one seq column containing the amino acid sequences, or
    • fasta file with amino acid sequences (accepted extensions: .fa, .fas, .fasta)
  • outf: output files name (without file extension)
  • (optional) save_pickle: (default = False)
  • (optional) save_logit: (default = False)
  • (optional) torch_device: (default = cuda:0)

aln_plm_entropy(entrdf_f, asralf)

Transforms the csv file with pLM entropy values outputted from the embed_entropy_esm2() or embed_entropy_protT5() functions so that values correspond to the alignment positions of the aligned input amino acid sequences.

options:

  • entrdf_f: -site_entropy.csv output file containing per site pLM entropy values, calculated with the embed_entropy_esm2() or embed_entropy_protT5() functions
  • asralf: protein sequence alignment of the sequences for which pLM entropy values were calculated in fasta format

calc_aln_entropy(alfile)

options:

  • alfile:
  • (optional) exclude_internal_nodes:

al_mod_entr_correl(alentropy, modentr_f)

options:

  • alfile:
  • modentr_f:
  • (optional) exclude_internal_nodes:

pLM entropy applied to Influenza A virus HA evotuned pLMs

See Lytras, S., Strange, A., Ito, J., and Sato K. (2025). Inferring context-specific site variation with evotuned protein language models. BioRxiv for details.

Evotuned pLMs

The weights for the ESM-2 and protT5 -based evotuned pLMs presented in the manuscript above can be downloaded from the zenodo repository below:

DOI


Data & code available in this repository

  • Evotuning

  • Inference

    • Jupyter notebook for inferring pLM entropy values using the evotuned models with the plm_entropy Python package.

    • Example data for the inference.

  • Testing data

    • IAV HA sequences and phylogenies used in the manuscript.
    • Note that protein sequences retrieved from the GISAID database have been omitted from the repository.
  • Additional tests

    • Jupyter notebook for predicting which protein sites will mutate in a given sequence context using pLM entropy.

    • Code for testing influence of pretrained ESM-2 by resetting weights before evotuning

    • Data and code for H3N2 tree backbone aBSREL branch-specific selection analysis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plm_entropy-0.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plm_entropy-0.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file plm_entropy-0.1.tar.gz.

File metadata

  • Download URL: plm_entropy-0.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plm_entropy-0.1.tar.gz
Algorithm Hash digest
SHA256 08723da1f7aa1ec1a2e2f36e5b9298fdddc539ea048bcb9bda5bed901d987a75
MD5 ababacdb85a0119ba10318f4a120c767
BLAKE2b-256 4757bbb0a926b2ca72d846a92b2c35e3fb5bf02a190c1fe9ab8b6d436348bdaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for plm_entropy-0.1.tar.gz:

Publisher: publish.yml on spyros-lytras/plm_entropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file plm_entropy-0.1-py3-none-any.whl.

File metadata

  • Download URL: plm_entropy-0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plm_entropy-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9d6fd53bfe9a12c795d39cf14023f01ec86d51502d77d28db192d8533519ca7c
MD5 85022f1351ce3ce5efbae896cf8cdb7e
BLAKE2b-256 da61eda775df6c42548be0eef90d3da3c0cc7914c976ac70eb44aaf82fd71e16

See more details on using hashes here.

Provenance

The following attestation bundles were made for plm_entropy-0.1-py3-none-any.whl:

Publisher: publish.yml on spyros-lytras/plm_entropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page