Skip to main content

No project description provided

Project description

MDSubSampler: Molecular Dynamics SubSampler

PyPI version

MDSubSampler is a Python library and toolkit for a posteriori subsampling of multiple trajectory data for further analysis. This toolkit implements uniform, random, stratified sampling, bootstrapping and targeted sampling to preserve the original distribution of relevant geometrical properties.

Citation

When using MDSubSampler, please cite the following paper:

N. Oues, S. C. Dantu, R. J. Patel, A. Pandini. MDSubSampler: a posteriori sampling of important protein conformations from biomolecular simulations. Bioinformatics (2023), btad427, https://doi.org/10.1093/bioinformatics/btad427

Prerequisites

This project requires Python (version 3.9.1 or later). To make sure you have the right version available on your machine, try running the following command.

$ python --version
Python 3.9.1

Table of contents

Getting Started

These instructions will get you a copy of the project up and running on your local machine for analysis and development purposes.

Installation

BEFORE YOU INSTALL: please read the prerequisites

To install and set up the library, run:

pip install MDSubSampler

Usage

Workflow

Input:

  • Molecular Dynamics trajectory
  • Geometric property
  • Atom selection [optional - default is "name CA"]
  • Reference structure [optional]
  • Sample size or range of sizes
  • Dissimilarity measure [optional - default is "Bhattacharyya"]

Output:

  • .dat file with calculated property for full trajectory (user input)
  • .dat file(s) with calculated property for one or all sample sizes input
  • .xtc file(s) with sample trajectory for one or all sample sizes
  • .npy file(s) with sample trajectory for one or all sample sizes
  • .npy training set for ML purposes for sample trajectory (optional)
  • .npy testing set for ML purposes for sample trajectory (optional)
  • .npy file(s) with sample trajectory for one or for all sample sizes
  • .png file with overlapped property distribution of reference and sample
  • .json file report with important statistics from the analysis
  • .txt log file with essential analysis steps and information

Scenarios

To run scenarios 1,2 or 3 you can download your protein trajectory and topology file (.xtc and .gro files) to the data folder and then run the following:

python mdss/scenarios/scenario_1.py data/<YourTrajectoryFile>.xtc data/<YourTopologyfile>.gro <YourPrefix>

Scenarios 1,2 and 3 are also available in Jupyter Notebooks format, can be used as templates and can be modified interactively according to the user's needs. You can also find more advanced scenarios in the cookbook directory. If you clone the library locally to your machine, then run the following command before you run the cells.

%cd <pathToMDSubSamplerDirectory>

Parser

If you are a terminal lover you can use the terminal to run the code and make a choice for the parser arguments. To see all options and choices run:

python mdss/run.py --help

Once you have made a selection of arguments, your command can look like the following example:

python mdss/run.py \
  --traj "data/<YourTrajectoryFile>.xtc" \
  --top "data/<YourTopologyFile>.gro" \
  --prefix "<YourPrefix>" \
  --output-folder "data/<YourResultsFolder>" \
  --property='DistanceBetweenAtoms' \
  --atom-selection='G55,P127' \
  --sampler='BootstrappingSampler' \
  --n-iterations=50 \
  --size=<SampleSize> \
  --dissimilarity='Bhattacharyya'

Development

With Poetry

Start by either downloading the tarball file from https://github.com/alepandini/MDSubSampler to your local machine or cloning this repo on your local machine:

git clone git@github.com:alepandini/MDSubSampler.git
cd MDSubSampler

Following that, download and install poetry from https://python-poetry.org/docs/#installation

Finally, run the following:

poetry install
poetry build
poetry shell

You can now start developing the library.

With Docker

Start by installing Docker using this link https://docs.docker.com/get-docker/.

Initially a Docker image will need to be built. To do this run the following command:

docker build -t <image name> .

Then run the following command to get access to a shell with all dependencies installed:

docker run -it -v $(pwd):/app -e PYTHONPATH=/app <image name> /bin/bash

This will also mirror the local filesystem in the Docker image, so that any local change will be reflected in the running container, and vice-versa, using a Docker volume.

The repo also includes two handy scripts to run all of the above faster (an image called subsampler will be created):

./build-docker
./run-docker

After dropping in the Docker shell, all dependencies will be installed, and the package scripts will also be in scope (the mdss command and all scenarios declared in pyproject.toml).

Authors

License

The library is licensed by GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdsubsampler-0.0.8.tar.gz (210.2 kB view details)

Uploaded Source

Built Distribution

mdsubsampler-0.0.8-py3-none-any.whl (218.7 kB view details)

Uploaded Python 3

File details

Details for the file mdsubsampler-0.0.8.tar.gz.

File metadata

  • Download URL: mdsubsampler-0.0.8.tar.gz
  • Upload date:
  • Size: 210.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.1 Darwin/23.3.0

File hashes

Hashes for mdsubsampler-0.0.8.tar.gz
Algorithm Hash digest
SHA256 fa417ebd473c269026d5ad2a104e5a95c96c3e353d91eecb892d8691151e4ac8
MD5 ecfe7343fe3ebb2b683d68ae34e9755d
BLAKE2b-256 ea7862935e023b1b842404532e8ff91a62182c266d560c9765e2885511c655d0

See more details on using hashes here.

File details

Details for the file mdsubsampler-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: mdsubsampler-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 218.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.1 Darwin/23.3.0

File hashes

Hashes for mdsubsampler-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2525d58ec98d18f14858568eaad0bb34c86f1f6331aff83855b84382c177029d
MD5 57000a121cb3f31015c41154f1f5f390
BLAKE2b-256 fc0b81e24a357e60d33ef06e1478d168b6d5bfca97d1bc08886903531c38f387

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page