PyFraME: Python framework for Fragment-based Multiscale Embedding
Project description
PyFraME: Python framework for Fragment-based Multiscale Embedding calculations
Copyright (C) 2017-2021 Jógvan Magnus Haugaard Olsen and Peter Reinholdt
PyFraME is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PyFraME is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with PyFraME. If not, see https://www.gnu.org/licenses/.
Description
PyFraME is a Python package providing a framework for managing fragment-based multiscale embedding calculations. In such calculations, a molecular system is divided into two principal domains: a central core and its environment. The core part is treated at the highest level of theory while the effects from the environment are included effectively through an embedding potential. Using PyFraME the user can automatize the workflow starting from an initial structure to the final embedding potential. It can be used to build a multilayer description of the molecular environment. Each layer can be described either by a standard embedding potential, i.e., using a predefined set of parameters, or by deriving the embedding-potential parameters based on first-principles calculations. For the latter, a fragmentation method is used to subdivide large molecular structures into smaller computationally manageable fragments. The number of layers, as well as the composition and level of theory used for each layer, can be fully customized.
The basic workflow consists of three main steps. First, a molecular structure is given as an input. Currently, PyFraME supports input files in the PDB format. The input file reader extracts information about the structure and composition of the system, and it also defines the basic units of the system, i.e., fragments. Small molecules typically constitute a fragment on their own, but larger molecules need to be broken down into smaller fragments. For example, for proteins, a fragment would usually consist of an amino-acid residue. The molecular system to be used for the embedding calculation is then built by extracting subsets from the full list of fragments according to user-specified criteria, such as name, chain ID, distance, or a combination thereof, and placed into separate regions. As mentioned above, any number of regions may be added, and each can be fully customized. Once the system has been built, the final step is the derivation of the embedding potential. Depending on the specifics, it may involve a large number of separate calculations on the individual fragments in order to compute the embedding-potential parameters. For large molecules, where the parameters cannot be computed directly, PyFraME uses a fragmentation method based on the molecular fractionation with conjugate caps (MFCC) approach to derive the parameters. The individual fragment calculations are typically performed by Dalton and the LoProp Python package but this can be customized. The fragmentation of the system, fragment calculations, and subsequent joining of parameters to build the embedding potential are fully automatized and can make full use of large-scale HPC resources.
For an example showing how PyFraME can be used, see Usage example.
How to cite
To cite PyFraME please use a format similar to the following:
J. M. H. Olsen, P. Reinholdt, and contributors, PyFraME: Python framework for Fragment-based Multiscale Embedding (version 0.4.0), 2021. DOI: 10.5281/zenodo.4899311. See https://gitlab.com/FraME-projects/PyFraME.
where the version and DOI should correspond to the actual version that was used. Note that the DOI 10.5281/zenodo.775113 represents all versions, and will always resolve to the latest one. A possible BibTeX entry can be found in the CITATION file. Alternatively, BibTeX and other formats can be generated by Zenodo.
Contributors
The list of past and current contributors is found here.
Requirements
To use PyFraME you need:
For certain functionality you will need one or more of the following:
- Dalton and LoProp for Dalton
- Molcas (not tested recently)
- OpenMolcas (not tested recently)
To run the test suite you need:
Installation
The PyFraME package can be installed from PyPI directly using pip
python -m pip install PyFraME
This will also install required dependencies (see above) that are available on PyPI, i.e., not Dalton, Molcas, etc.
The entire source including history can be found at GitLab. All releases are also deposited at Zenodo.
Testing
If you installed from PyPI, the unit tests can be executed by typing
python -m pytest --pyargs pyframe
in a terminal. To execute the full test suite (unit tests and integration tests), which can be obtained by downloading the source from GitLab, run
python -m pytest
from the PyFraME root directory.
Issues
Please report issues here.
Contributing
Please take a look at the contribution guide.
Usage example
The following commented example is based on a molecular system consisting of a channelrhodopsin protein dimer embedded in a lipid membrane. For examples of how PyFraME can be integrated in computational studies of response and transition properties of molecular systems, we refer to our tutorial review article.
import pyframe
# Create MolecularSystem() object. Currently only PDB and fixed-format PQR files
# are supported (you can, however, give your own reader as an argument).
system = pyframe.MolecularSystem(input_file='/path/to/input/file.pdb')
# By default fragments are defined by the input but fragments can be modified
# as shown here. This will affect all fragments with the given names.
system.split_fragment_by_name(
name='RETK',
new_names=['LYSB', 'LYSS', 'RET'],
fragment_definitions=[['N', 'H', 'CA', 'HA', 'C', 'O'],
['CB', 'HB1', 'HB2', 'CG', 'HG1', 'HG2', 'CD',
'HD1', 'HD2', 'CE', 'HE1', 'HE2'],
['.*']])
system.split_fragment_by_name(
name='POPE',
new_names=['POP1', 'POP2', 'POP3', 'POP4', 'POP5'],
fragment_definitions=[['N', 'HN1', 'HN2', 'HN3', 'C12', 'H12A', 'H12B',
'C11', 'H11A', 'H11B', 'P', 'O13', 'O14', 'O11',
'O12', 'C1', 'HA', 'HB', 'C2', 'HS', 'O21',
'C3', 'HX', 'HY', 'O31'],
['C21', 'O22', 'C22', 'H2R', 'H2S', 'C23', 'H3R',
'H3S', 'C24', 'H4R', 'H4S', 'C25', 'H5R', 'H5S',
'C26', 'H6R', 'H6S', 'C27', 'H7R', 'H7S', 'C28',
'H8R', 'H8S', 'C29', 'H91'],
['0C21', '1H10', '1C21', 'H11R', 'H11S', '2C21',
'H12R', 'H12S', '3C21', 'H13R', 'H13S', '4C21',
'H14R', 'H14S', '5C21', 'H15R', 'H15S', '6C21',
'H16R', 'H16S', '7C21', 'H17R', 'H17S', '8C21',
'H18R', 'H18S', 'H18T'],
['C31', 'O32', 'C32', 'H2X', 'H2Y', 'C33', 'H3X',
'H3Y', 'C34', 'H4X', 'H4Y', 'C35', 'H5X', 'H5Y',
'C36', 'H6X', 'H6Y', 'C37', 'H7X', 'H7Y', 'C38',
'H8X', 'H8Y', 'C39', 'H9X', 'H9Y'],
['0C31', 'H10X', 'H10Y', '1C31', 'H11X', 'H11Y',
'2C31', 'H12X', 'H12Y', '3C31', 'H13X', 'H13Y',
'4C31', 'H14X', 'H14Y', '5C31', 'H15X', 'H15Y',
'6C31', 'H16X', 'H16Y', 'H16Z']])
# Extract fragments and put them in core region.
core = system.get_fragments_by_identifier(identifiers=['248_A_RET'])
core += system.get_fragments_by_distance(distance=3.0, reference=core,
use_center_of_mass=False,
protect_molecules=False)
system.set_core_region(core, basis='pcseg-2')
# Extract protein (here I use chain id because all protein fragments in this case
# have the same id).
protein = system.get_fragments_by_chain_id(chain_ids=['A'])
# Add a region containing the protein in it. Note that each of these settings
# have defaults and that there are more than those shown here.
system.add_region(name='protein', fragments=protein, use_mfcc=True,
mfcc_order=2, use_multipoles=True, multipole_order=2,
use_polarizabilities=True, basis='loprop-6-31+G*')
# Here we repeat for lipids, ions, and solvent.
lipids = system.get_fragments_by_distance_and_name(
distance=8.0,
names=['POP1', 'POP2', 'POP3', 'POP4', 'POP5'],
reference=protein)
system.add_region(name='lipid', fragments=lipids, use_mfcc=True, mfcc_order=2,
use_multipoles=True, multipole_order=2,
use_polarizabilities=True, basis='loprop-6-31+G*')
ions = system.get_fragments_by_distance_and_name(distance=8.0,
names=['NA', 'CL'],
reference=protein)
system.add_region(name='ion', fragments=ions, use_multipoles=True,
multipole_order=0, use_polarizabilities=True,
basis='6-31+G*')
solvents = system.get_fragments_by_distance_and_name(distance=8.0,
names=['SOL'],
reference=protein)
system.add_region(name='solvent', fragments=solvents, use_multipoles=True,
multipole_order=2, use_polarizabilities=True,
basis='loprop-6-31+G*')
# Create Project() object that is used to create embedding potentials and write
# input files.
project = pyframe.Project()
# Set path to scratch directory. This will be used by the auxiliary programs,
# e.g. Dalton or Molcas.
project.scratch_dir = '/path/to/scratch'
# Set path to working directory (it will be created if it does not exist).
# This directory will contain the final output files from PyFraME (e.g. Dalton
# mol and pot files), and the output from the auxiliary program. In addition,
# during execution it will contain temporary directories for each fragment.
project.work_dir = '/path/to/work'
# Specifies the number of jobs that will be run on each node. A fragment may
# require one or more calculations run by an auxiliary program. Each of these
# counts as a job.
project.jobs_per_node = 2
# Specifies memory per job. Note that this amount will be shared by MPI processes.
project.memory_per_job = 2048 * 12
# Number of MPI processes per job.
project.mpi_procs_per_job = 12
# You can manually specify the name of nodes that should be used to run jobs.
# PyFraME will attempt to autodetect nodes from SLURM and PBS/Torque queuing
# systems. For example:
# project.node_list = ['{0}'.format(os.environ['HOSTNAME'])]
# Prints all the details regarding the setup. Note that all of the settings
# demonstrated above have defaults which are shown with the method below.
project.print_info()
# This will start the fragment calculations using the using the auxiliary
# programs and settings defined when creating the regions.
project.create_embedding_potential(system)
# Write potential file containing all parameters of the embedding potential.
project.write_potential(system)
# Write molecule file containing the core quantum region.
project.write_core(system)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.