Skip to main content

Prediction of amyloid propensity from amino acid sequences using ensemble deep learning and LLM models

Project description

AmyloDeep

Prediction of amyloid propensity from amino acid sequences using deep learning

AmyloDeep is a Python package that uses a 5-model ensemble to predict amyloidogenic regions in protein sequences using a rolling window approach. The package combines multiple state-of-the-art machine learning models including ESM2 transformers, UniRep embeddings, SVM, and XGBoost to provide accurate amyloid propensity predictions.

Features

  • Multi-model ensemble: Combines 5 different models for robust predictions
  • Rolling window analysis: Analyzes sequences using sliding windows of configurable size
  • Pre-trained models: Uses models trained on amyloid sequence databases
  • Calibrated probabilities: Includes probability calibration for better confidence estimates
  • Easy-to-use API: Simple Python interface and command-line tool
  • Streamlit web interface: Optional web interface for interactive predictions

Installation

From PyPI (recommended)

pip install amylodeep

From source

git clone https://github.com/AlisaDavtyan/protein_classification.git
cd amylodeep
pip install -e .

For development:

pip install amylodeep[dev]

Quick Start

Python API

from amylodeep import predict_ensemble_rolling

# Predict amyloid propensity for a protein sequence
sequence = "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"
result = predict_ensemble_rolling(sequence, window_size=6)

print(f"Average probability: {result['avg_probability']:.4f}")
print(f"Maximum probability: {result['max_probability']:.4f}")

# Access position-wise probabilities
for position, probability in result['position_probs']:
    print(f"Position {position}: {probability:.4f}")

Command Line Interface

# Basic prediction
amylodeep "MKTFFFLLLLFTIGFCYVQFSKLKLENLHFKDNSEGLKNGGLQRQLGLTLKFNSNSLHHTSNL"

# With custom window size
amylodeep "SEQUENCE" --window-size 10

# Save results to file
amylodeep "SEQUENCE" --output results.json --format json

# CSV output
amylodeep "SEQUENCE" --output results.csv --format csv

Model Architecture

AmyloDeep uses an ensemble of 5 models:

  1. ESM2-150M: Fine-tuned ESM2 transformer (150M parameters)
  2. UniRep: UniRep-based neural network classifier
  3. ESM2-650M: Custom classifier using ESM2-650M embeddings
  4. SVM: Support Vector Machine with ESM2 embeddings
  5. XGBoost: Gradient boosting with ESM2 embeddings

The models are combined using probability averaging, with some models using probability calibration (Platt scaling or isotonic regression) for better confidence estimates.

Requirements

  • Python >= 3.8
  • PyTorch >= 1.9.0
  • Transformers >= 4.15.0
  • NumPy >= 1.20.0
  • scikit-learn >= 1.0.0
  • XGBoost >= 1.5.0
  • jax-unirep >= 2.0.0
  • wandb >= 0.12.0

Main Functions

predict_ensemble_rolling(sequence, window_size=6)

Predict amyloid propensity for a protein sequence using rolling window analysis.

Parameters:

  • sequence (str): Protein sequence (amino acid letters)
  • window_size (int): Size of the rolling window (default: 6)

Returns: Dictionary containing:

  • position_probs: List of (position, probability) tuples
  • avg_probability: Average probability across all windows
  • max_probability: Maximum probability across all windows
  • sequence_length: Length of the input sequence
  • num_windows: Number of windows analyzed

Individual model classes for ESM and UniRep-based predictions.

Contributing

We welcome contributions! Please see our contributing guidelines for more information.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use AmyloDeep in your research, please cite:

@software{amylodeep2025,
  title={AmyloDeep: Prediction of amyloid propensity from amino acid sequences using deep learning},
  author={Alisa Davtyan},
  year={2025},
  url={https://github.com/AlisaDavtyan/protein_classification}
}

Support

For questions and support:

Changelog

v0.1.0

  • Initial release
  • 5-model ensemble implementation
  • Rolling window prediction
  • Command-line interface
  • Python API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amylodeep-0.1.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amylodeep-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file amylodeep-0.1.0.tar.gz.

File metadata

  • Download URL: amylodeep-0.1.0.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for amylodeep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b4c3d276ca65a3d0ddfc8a056d7a367704467612626a7f365e728129dee72455
MD5 d631ec99cc264e436cff999bd55ec3c4
BLAKE2b-256 b38e1ba4eb09994d5be70ed603c0b04cdfda6c38e524b237cbe34babefc81b19

See more details on using hashes here.

File details

Details for the file amylodeep-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: amylodeep-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for amylodeep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1c85286a76bd44a7a75866eff0160f2f7ffd7fd2a01e524070ef0c457257d49
MD5 3910af4fde1556515b700884aa292e9d
BLAKE2b-256 181bc5a933472f5eb683c8d5ea79996838de50ba0231084810d7c6dba36f7846

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page