Prediction of amyloid propensity from amino acid sequences using ensemble deep learning and LLM models
Project description
AmyloDeep
Prediction of amyloid propensity from amino acid sequences using deep learning
AmyloDeep is a Python package that uses a 5-model ensemble to predict amyloidogenic regions in protein sequences using a rolling window approach. The package combines multiple state-of-the-art machine learning models including ESM2 transformers, UniRep embeddings, SVM, and XGBoost to provide accurate amyloid propensity predictions.
Features
- Multi-model ensemble: Combines 5 different models for robust predictions
- Rolling window analysis: Analyzes sequences using sliding windows of configurable size
- Pre-trained models: Uses models trained on amyloid sequence databases
- Calibrated probabilities: Includes probability calibration for better confidence estimates
- Easy-to-use API: Simple Python interface and command-line tool
- Streamlit web interface: Optional web interface for interactive predictions
Installation
From PyPI (recommended)
pip install amylodeep
From source
git clone https://github.com/AlisaDavtyan/protein_classification.git
cd amylodeep
pip install amylodeep
Quick Start
Python API
from amylodeep import predict_ensemble_rolling
# Predict amyloid propensity for a protein sequence
sequence = "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"
result = predict_ensemble_rolling(sequence, window_size=6)
print(f"Average probability: {result['avg_probability']:.4f}")
print(f"Maximum probability: {result['max_probability']:.4f}")
# Access position-wise probabilities
for position, probability in result['position_probs']:
print(f"Position {position}: {probability:.4f}")
Command Line Interface
# Basic prediction
amylodeep "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"
# With custom window size
amylodeep "SEQUENCE" --window-size 10
# Save results to file
amylodeep "SEQUENCE" --output results.json --format json
# CSV output
amylodeep "SEQUENCE" --output results.csv --format csv
Model Architecture
AmyloDeep uses an ensemble of 5 models:
- ESM2-150M: Fine-tuned ESM2 transformer (150M parameters)
- UniRep: UniRep-based neural network classifier
- ESM2-650M: Custom classifier using ESM2-650M embeddings
- SVM: Support Vector Machine with ESM2 embeddings
- XGBoost: Gradient boosting with ESM2 embeddings
The models are combined using probability averaging, with some models using probability calibration (Platt scaling or isotonic regression) for better confidence estimates.
Requirements
- Python >= 3.8
- PyTorch >= 1.9.0
- Transformers >= 4.15.0
- NumPy >= 1.20.0
- scikit-learn >= 1.0.0
- XGBoost >= 1.5.0
- jax-unirep >= 2.0.0
- wandb >= 0.12.0
Main Functions
predict_ensemble_rolling(sequence, window_size=6)
Predict amyloid propensity for a protein sequence using rolling window analysis.
Parameters:
sequence(str): Protein sequence (amino acid letters)window_size(int): Size of the rolling window (default: 6)
Returns: Dictionary containing:
position_probs: List of (position, probability) tuplesavg_probability: Average probability across all windowsmax_probability: Maximum probability across all windowssequence_length: Length of the input sequencenum_windows: Number of windows analyzed
Individual model classes for ESM and UniRep-based predictions.
Contributing
We welcome contributions! Please see our contributing guidelines for more information.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use AmyloDeep in your research, please cite:
@software{amylodeep2025,
title={AmyloDeep: Prediction of amyloid propensity from amino acid sequences using deep learning},
author={Alisa Davtyan},
year={2025},
url={https://github.com/AlisaDavtyan/protein_classification}
}
Support
For questions and support:
- Open an issue on GitHub
- Contact: alisadavtyan7@gmail.com
Changelog
v0.1.0
- Initial release
- 5-model ensemble implementation
- Rolling window prediction
- Command-line interface
- Python API
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amylodeep-0.2.6.tar.gz.
File metadata
- Download URL: amylodeep-0.2.6.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1d59c6d33461a9c1b4fa7512eae8d174829c61f1c949c59557fac4a01a3c04b
|
|
| MD5 |
7f545807f8ad7fca643cfcd52cbe69a5
|
|
| BLAKE2b-256 |
17b191b94f3e8d74462612b0e617d321e8cae04d1c9e4452a3ff71bb4ef249ac
|
File details
Details for the file amylodeep-0.2.6-py3-none-any.whl.
File metadata
- Download URL: amylodeep-0.2.6-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23463a9ae47bf0ea6eba7d1415eecb5d6e956d3629e10a4ee69993e013a701cc
|
|
| MD5 |
84427dc70c93725ff0df48989c776201
|
|
| BLAKE2b-256 |
e54896d528c20f9c1eb4606ce16d4ff2a40c2e41fe0f1418dff1ddb0c5787959
|