parallel analysis for ensemble simulations
Project description
Ensemble Parallel MDAnalysis
Warning: This package is still under constrution.
ENPMDA is a parallel analysis package for ensemble simulations powered by MDAnalysis.
It stores metadata in pandas.DataFrame and distributes computation jobs in dask.DataFrame so that the parallel analysis can be performed not only for one single trajectory but also across simulations and analyses.
It can be used as an initial inspection of the raw trajectories as well as a framework for extracting features from final production simulations for further e.g. machine learning and markov state modeling. It automatically fixes the PBC issue, and align and center the protein inside the simulation box. It also works for multimeric proteins!
The framework is intended to be adaptable by being able to simply wrapping MDAnalysis analysis functions without worrying about the parallel machinery behind.
Free software: GNU General Public License v3
Documentation: https://ENPMDA.readthedocs.io.
Features
Parallel analysis for ensemble simulations.
Dataframe for storing and accessing results.
dask-based task scheduler, suitable for both workstations and clusters.
Expandable analysis library powered by MDAnalysis.
Example Code Snippet
from ENPMDA import MDDataFrame
from ENPMDA.preprocessing import TrajectoryEnsemble
from ENPMDA.analysis import get_backbonetorsion, rmsd_to_init
# construct trajectory ensemble
traj_ensembles = TrajectoryEnsemble(
ensemble_name='ensemble',
topology_list=ensemble_top_list,
trajectory_list=ensemble_traj_list
)
# initilize dataframe and add trajectory ensemble
md_dataframe = MDDataFrame(dataframe_name='dataframe')
md_dataframe.add_traj_ensemble(traj_ensembles, npartitions=16)
# add analyses
md_dataframe.add_analysis(get_backbonetorsion)
md_dataframe.add_analysis(rmsd_to_init)
# save dataframe
md_dataframe.save('results')
# retrieve feature
feature_dataframe = md_dataframe.get_feature([
'torsion',
'rmsd_to_init'
])
# plot analysis results
import seaborn as sns
sns.barplot(data=feature_dataframe,
x='system',
y='rmsd_to_init')
sns.lineplot(data=feature_dataframe,
x='traj_time',
y='0_phi_cos',
hue='system')
Workflow Illustration
TODO
option to add more than one ensemble
more analysis functions.
unit testing
benchmarking
documentation
add functions to cancel running tasks
See Also
MDAnaysis: https://www.mdanalysis.org/
dask: https://dask.org/
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2022-05-09)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ENPMDA-0.5.0.tar.gz
.
File metadata
- Download URL: ENPMDA-0.5.0.tar.gz
- Upload date:
- Size: 4.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a7561875a3474300927adfc8e7418080e45c5e80cc6260767d7b429fef21d5 |
|
MD5 | 66c04b55d5844438a406f9b282e140bf |
|
BLAKE2b-256 | a51d6121ff7676575ab18db5c21c945c3b56a91d8de98f3e0fdb36e79bb4788e |
File details
Details for the file ENPMDA-0.5.0-py2.py3-none-any.whl
.
File metadata
- Download URL: ENPMDA-0.5.0-py2.py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0be837d4baf7047a02c6e6c4ee0256281ded56d4024da0fcb6cb5ef5609ea6dc |
|
MD5 | f702310b9cdb3955e246d89fc3221798 |
|
BLAKE2b-256 | 9a6fc1f3f6fcde90bd14dd8eb59abc25282ce8ed3c5fd01c97ae8847cab8528f |