Skip to main content

Application of Deep learning on molecular dymanamics trajectories

Project description

Molecular Dynamics & Machine Learning (MDML)

This repository is for the graduation projrct for the Master in Data Science for Life Sciences at Hanze University of Applied Sciences.

Proteins play a vital role in many biological processes and are essential to the structure and function of cells. Dysfunctional proteins can lead to disease and studying them can aid in understanding the underlying cause of the disease and potentially developing treatments and medication. In this study, a custom Convolutional Neural Network (CNN) was used to analyze the active and inactive states of the EGFR protein, a key player in cancer. The CNN was able to identify key residues that define the active and inactive state of the protein, specifically the DFG-Asp motif, with 100% accuracy. Our methodology for image transformation represented the 3D coordinates of atoms in a protein as 2D images, which differs from existing methods in literature. The results of this study demonstrate the potential of using deep learning methods on MD simulation trajectories, but also highlight the need for careful evaluation of the used methods and their utility in order to ensure meaningful insights.

Student: Stylianos Mavrianos, s.mavrianos@st.hanze.nl
Supervisor: Tsjerk Wassenaar, t.a.wassenaar@pl.hanze.nl
Daily supervisor: Vilacha Madeira R Santos, j.f.vilacha@rug.nl

Research questions

  1. Assess whether a Convolutional Neural Networks (CNN) classification approach is useful and relevant in the field of MD.
  2. Is it possible to predict long term simulations from sort term ones?
  3. How short is short enough?

Requirements

  • Python 3.8.10
  • Numpy
  • MatplotLib
  • PLotly
  • Scikit-learn
  • Tensorflow
  • Keras
  • MDAnalysis
  • cv2
  • yaml

Setup

  1. Clone the repository to your local machine:

git clone https://github.com/StevetheGreek97/MD_ML.git

  1. Create a new environment:

virtualenv MD_ML

  1. Install the required packages:

pip install -r requirments.txt

  1. Example tutorials for each module are in the Examples folder. There are all jupyter notebooks.

Usage

The pipeline consists of three modules: Preprocessing.py, Machinelearning.py and Mapping.py.

To get started, simply configure a yaml configuration file (conf.yml) that includes:

  1. the 'masterpath' to a folder containing subfolders for each classification state (e.g., active, inactive state) -> str

Each subfolder should contain a .pdb and .xtc file for the corresponding state.

< EGFR >
    |
    |--data 
    |    |  
    |    |
    |    |--active
    |    |     |
         |     |--topology file (.pdb)
         |     |
         |     |--coordinates file (.xtc)
         |
         |--inactive
               |
               |--topology file (.pdb)
               |
               |--coordinates file (.xtc)
  1. a 'savingpath' that all the results with be saved. -> str

  2. 'downsampled_to' how many image should be created for each state -> int

downsample_to: 1659
masterpath: /path/to/data/
savepath: /path/to/save

The final output includes a series of down-sampled images, a prefomance img, a confusion matrix, a saliency map, a .txt file listing important residues, and a .pdb file with b-factor information showing the important residues.

< output >
    |
    |-- imgs 
    |     |
    |     |-- active
    |     |     |
    |     |     |-- active_X.jpg    # The down-sampled active trajectory transcribed in RGB values
    |     |
    |     |-- inactive
    |     |     |
    |           |-- inactive_X.jpg  # The down-sampled inactive trajectory transcribed in RGB values
    |     
    |-- models
    |     |
    |     |-- model.h5               # The trained model 
    |
    |-- performance
    |     |
    |     |-- peformancne.jpg        # An image of the model's performance (locc, accuracy)
    |     |-- confusion_matrix.jpg   # An image of the confusion matrix of the model 
    |
    |-- results
    |     |
    |     |-- inactive.txt           # The important residues for the inactive state
    |     |-- inactive.pdb           #  b-factor information showing the important residues for the inactive state
    |     |-- sal_map_inactive.jpg   # The saliency map of the inactive state
          |-- active.txt             # The important residues for the active state
          |-- active.pdb             #  b-factor information showing the important residues for the active state
          |-- sal_map_active.jpg     # The saliency map of the active state
          |   
    

In order to run the pipeline simply run this code:

python3 main.py -c path/to/confg.yml

Alternatevly you can run the three modules separately. They also serve as checkpoints.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MDML-0.0.8.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MDML-0.0.8-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file MDML-0.0.8.tar.gz.

File metadata

  • Download URL: MDML-0.0.8.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/5.0.0 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8

File hashes

Hashes for MDML-0.0.8.tar.gz
Algorithm Hash digest
SHA256 4ab94bfb5b8b29fdf333d96420fdc9c8905cb82b17781351a5198b33a56d8f46
MD5 620edd46c129aad3c826468af2f0324a
BLAKE2b-256 4423b4ad1ae2de1574871ed01a24dbadc22507879c27b4a91d3192cf94f10901

See more details on using hashes here.

File details

Details for the file MDML-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: MDML-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/5.0.0 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8

File hashes

Hashes for MDML-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6da0d1640c92a24e8f579ab5be89b9d804123f00ad44c0fac4486c4dbd5a4f63
MD5 67b833d7ea2d722dc49d5341157f4d82
BLAKE2b-256 a8e0c5777a4d09f8ec9857eb70b6bd02d3cc57f831a2cbfa5bdfd5abad217883

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page