A custom LPR model for lipid-binding Protein prediction

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

license: mit language:

en base_model:
EvolutionaryScale/esmc-300m-2024-12
google-bert/bert-base-uncased new_version: Noora68/lpr-0.4B tags:
biology
protein
protein classification
lipid binding
lipid binding site
recognition

Lipid-Protein Recognition (LPR)

we present a robust prediction tool termed Lipid-Protein Recognition (LPR) for predicting the lipid categories that interact with proteins, utilizing protein sequences as the only input. Using a combined model architecture by the fusion of ESM C and BERT models, our method enables accurate and interpretable prediction to distinguish lipid-binding signature among the 8 major lipid categories defined by LIPID MAPS. LPR will serve as a powerful tool to facilitate the exploration of lipid-binding specificity and rational protein design.

Paper: https://...
GitHub Repository: https://github.com/Noora68/Lipid-binding-Protein-Recognition-LPR
Online Demo: https://colab/

Model Details

Architecture: ESM Cambrian + BERT + classification head
Task: Multi-label protein-lipid binding prediction
Fine-tuned from: ESMC_300m + bert-base-uncased
Developed by: Noora68
Framework: PyTorch + HuggingFace Transformers

Model usage workflow:

Load the model and tokenizer
Process the input sequence (tokenize → batch → pad → mask)
Run inference to obtain logits → probabilities
Output the results and mark high-confidence categories

Usage

from modeling_lpr import LPR
import torch
from torch.nn.utils.rnn import pad_sequence
from esm.tokenization import EsmSequenceTokenizer

# Set device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = EsmSequenceTokenizer()

# Default lipid type dictionary
default_dict = {
    "0": "NotLipidType",
    "1": "Fatty Acyl (FA)",
    "2": "Prenol Lipid (PR)",
    "3": "Glycerophospholipid (GP)",
    "4": "Sterol Lipid (ST)",
    "5": "Polyketide (PK)",
    "6": "Glycerolipid (GL)",
    "7": "Sphingolipid (SP)",
    "8": "Saccharolipid (SL)"
}

# Load pretrained LPR model
model = LPR.from_pretrained("Noora68/lpr-0.4B").to(device)

# Example protein sequence
sequence = "MDSNFLKYLSTAPVLFTVWLSFTASFIIEANRFFPDMLYFPM"

# Tokenize the sequence -> input_ids
input_ids = torch.tensor(tokenizer.encode(sequence))

# Add batch dimension: (batch_size=1, length)
input_ids = input_ids.unsqueeze(0)

# Pad to the longest sequence in the batch
input_ids_padded = pad_sequence(input_ids, batch_first=True, padding_value=tokenizer.pad_token_id)

# Build attention mask: 1 for real tokens, 0 for padding
attention_mask = (input_ids_padded != tokenizer.pad_token_id).long()

# Move tensors to the same device as model
input_ids_padded = input_ids_padded.to(device)
attention_mask = attention_mask.to(device)

# Forward pass (no gradient needed during inference)
with torch.no_grad():
    outputs = model(input_ids_padded, attention_mask)

# Convert logits to probabilities using sigmoid
probs = torch.sigmoid(outputs['logits'])

# Convert to CPU and numpy array
probs = probs.squeeze().detach().cpu().numpy()

# Print results: add a check mark if probability > 0.6
for i, p in enumerate(probs):
    mark = " √" if p > 0.6 else ""
    print(f"{default_dict[str(i)]:<25}: {p:.4f}{mark}")

output of the above example is:

NotLipidType             : 0.0007
Fatty Acyl (FA)          : 0.1092
Prenol Lipid (PR)        : 0.9178 √
Glycerophospholipid (GP) : 0.6059 √
Sterol Lipid (ST)        : 0.0083
Polyketide (PK)          : 0.0026
Glycerolipid (GL)        : 0.0771
Sphingolipid (SP)        : 0.0002
Saccharolipid (SL)       : 0.0000

Limitations

Trained only on lipid-binding protein data and may not generalize to other functions.
Model performance is best with sequence lengths under 500.
Dataset size is limited compared to large-scale protein corpora.
Model may reflect biases present in training data (e.g., under-representation of certain lipid types).

Citation

If you use this model, please cite:

@article{your2025paper,
  title={Deciphering the code of lipid binding by large language model},
  author={Feitong Dong,},
  journal={Bioinformatics},
  year={2025}
}

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.1.1

Aug 18, 2025

1.1.0

Aug 18, 2025

This version

1.0.0

Aug 18, 2025

0.1.1

Aug 18, 2025

0.1.0

Aug 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lpr_model-1.0.0.tar.gz (5.0 kB view details)

Uploaded Aug 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lpr_model-1.0.0-py3-none-any.whl (5.0 kB view details)

Uploaded Aug 18, 2025 Python 3

File details

Details for the file lpr_model-1.0.0.tar.gz.

File metadata

Download URL: lpr_model-1.0.0.tar.gz
Upload date: Aug 18, 2025
Size: 5.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lpr_model-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f6c4d3d5ce420ab0e9dac04e2f6c6ac6c45ba4ec3731b6bdd4f5f35dd6e0e8a3`
MD5	`1a7b896088e45076f8841a3464a83f84`
BLAKE2b-256	`c58ece5a0a31349b506091cdf6c93994ae8451695e375337d1d5fa8653634d22`

See more details on using hashes here.

File details

Details for the file lpr_model-1.0.0-py3-none-any.whl.

File metadata

Download URL: lpr_model-1.0.0-py3-none-any.whl
Upload date: Aug 18, 2025
Size: 5.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lpr_model-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ffad0a43c74db6962b4b90185a77dfd78d9d29a65fbdbe9062e99ce84de79685`
MD5	`e52ee777566581005fb98b2476531198`
BLAKE2b-256	`84a0af319ebd03177d2573602f962197db9adac542b785c1db85f82b66a64f29`

See more details on using hashes here.

lpr-model 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lipid-Protein Recognition (LPR)

Model Details

Usage

output of the above example is:

Limitations

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes