Skip to main content

Utility scripts for MD data

Project description

============
MD Utilities
============

.. image:: https://img.shields.io/travis/cmayes/md_utils.svg
:target: https://travis-ci.org/cmayes/md_utils

.. image:: https://img.shields.io/pypi/v/md_utils.svg
:target: https://pypi.python.org/pypi/md_utils

.. image:: https://coveralls.io/repos/cmayes/md_utils/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/cmayes/md_utils?branch=master

Utility scripts for MD data

* Free software: BSD license
* Documentation: https://md_utils.readthedocs.org.


Package organization
--------------------

This packaging process is based on https://jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/

Scripts added to the package aim have at least 90% test coverage.


Installation
------------

1. Make sure python is installed. Python 2.7 is recommended. The package is meant to be python 3 compatible,
but primarily tested with Python 2.7. There are many ways to install python.
For example, see http://conda.pydata.org/miniconda.html

2. From the base folder where you would like the set of files (a new folder will be created, by default called md_utils):
::
git clone https://github.com/team-mayes/md_utils.git

3. From the git repository folder

a. see https://pip.pypa.io/en/stable/installing/ to install if you don't have pip installed
b. run:
::
pip install md_utils --user
c. alternately (the * below will change based on current version; this is created in the first step):
::
python setup.py sdist
pip install dist/md_utils-*.tar.gz --user
4. To allow the scripts to be found anywhere on your computer using your terminal screen,
making sure the following path is in your .bashrc or .bash_profile path, and remember to source that file
after an update:
::
$HOME/.local/bin

Upgrade
-------

From the location of your cloned git repository, make sure you have the latest files, then use pip to update:
::
git pull
pip install --upgrade md_utils --user

Example
-------

1. Update a configuration file like one found in the folder
https://github.com/team-mayes/md_utils/tree/master/tests/test_data/data_edit
(FYI, config files end with "ini" and there are a variety here, meant
to work or fail, to test functionality).
2. To see if the installation worked, try running with the help option. All scripts in
this package have such an option, which will briefly tell you about the code::
data_edit -h

3. To only have the program print interactions "owned" by atom numbers 1
and 2 in a data file called "my_file.data", make a file called (for example)
"print_owned_atoms.ini", with
the following text::
[main]
data_file = my_file.data
print_interactions_owned_by_atoms = 1,2

This assumes that my_file.data is in the same file as the
configuration file. You can have as many atom numbers as you wish;
just separate them by commas.

4. Give it a try!
Run::
data_edit -c print_owned_atoms.ini


When the whole git repository is cloned, there will many example input files in the tests/test_data folder.


-------
Scripts
-------

For combining/processing output:
--------------------------------

align_on_col
For combining data from multiple files based on a common timestep. All other data will be ignored or, if in logging
mode, printed to a log file.

fes_combo
Combines multiple FES output files into a single file so that the first
column's value is sequential. Files with higher starting index numbers
are favored.

filter_col
Produces a file in which only rows are reproduced that pass filtering criteria set in the configuration file. The
config file allows specifying max and/or min values for any column heading (with a "max_vals" and/or "max_vals"
section in the config file). Additionally, a "bin" section
allows the user to specify a column name on which to bin data. The user should provide a list of integers:
initial_bin_value, final_bin_value, num_bins, and (optionally) a max number of rows per bin.

path_bin
Creates a summary VMD XYZ file (and separate log file) that averages a
set of coordinates (one set of XYZ coordinates per line) from an input file.

per_col_stats
Given a file with columns of data, returns the min, max, avg, and std dev per column. Optionally, it can return
the maximum value from each column plus a "buffer" length (useful for preparing CP2K input for FitEVB).

press_dups
Compresses lines in a given CSV based on duplicate values in a specified
column (RMSD by default) Compressed lines have their values averaged.


For PMF calculations:
---------------------

md_init
Initializes a location for running md utilities. Specifically, it makes template files for creating wham input.

wham_split
Breaks wham input into increasingly smaller blocks (divide initial data set
by 2, then 3, 4...) and creates wham input (meta) files and submit scripts.

wham_rad
Calculates the radially-corrected free energy values from WHAM output.

calc_pka
From the wham_rad output, calculates the pKa.


For processing LAMMPS output:
-----------------------------

data_edit
offers a range of options to:

* produce a new, edited data file (such as renumbering interactions types)

see example scripts in tests/test_data/data_edit: data_reorder.ini, data_retype.ini, data_sort.ini

* output selected data (i.e. interactions involving or owned by a particular atom number)

see example scripts in tests/test_data/data_edit: data_print_impt_atoms.ini, data_print_own_atoms.ini

* compare two data files and output only "meaningful" differences (ignore formatting differences,
order of bonds, angles, dihedrals, atom XYZ coords, notes...)

see example script tests/test_data/data_edit/data_compare.ini

data2pdb
* produces a pdb file with the coordinates from a data file, and everything else as in the pdb file

see example scripts in tests/test_data/data2odb: data2pdb.ini, data2pdb_glu_dict.ini ...

* specify a pdb template file with 'pdb_tpl_file'
* specify a single data file with 'data_file'
* specify a file that lists any number of data file names with 'data_list_file'
* specify an output directory with 'output_directory'
* make a dictionary by lining up the rows of the data and pdb files with 'make_dictionary_flag = True'
* use a dictionary to check alignment (proper ordering) of data file with 'use_atom_dict_flag = True'

dump_edit
available options include renumbering atoms or molecules and producing a new file with a subset of timesteps

lammps_dist
Calculates the distances between a given set of atom pairs for each
time step in a given LAMMPS dump file

lammps_proc_data
From lammps dump file(s), finds key distance, such as the hydroxyl OH distance on the protonatable residue
(when protonated). This script assumes we care about one protonatable residue in a simulation with a PBC.

pdb_edit
Creates a new version of a pdb file applying options such as renumbering molecules.
* use the option "add_element_types = true" to fill in the column of element types (VMD dropped them for the protein section; CP2K wants them)
* by default, it will check all atoms. You can specify a range on which to perform this action with 'first_atom_add_element' and 'last_atom_add_element'
* it will only add the element type if it is in the internal atom_type/element dictionary (a warning will show if a type is not in the dictionary). Otherwise, it will leave those columns as they originally were.
* by default, it loads a dictionary I made based on charmm atom types (charmm_atom_type,element; one per line). The user can specify a different dictionary file with "atom_type_element_dict_file"
* if the user specifies a 'first_wat_atom' and 'last_wat_atom', the program will check that the atoms are printed in the order OH2, H1, H2
* when using this option, if the first protein atom is not 1 (numbering begins at 1, like in a PDB, not 0, like VMD index), use the option "last_prot_atom = " to indicate the first protein atom num
* this options requires inputting the last protein atom id (add "last_prot_atom = X" to the configuration file, where X is the integer (decimal) atom number)
* by default, the output pdb name of a pdb file called 'struct.pdb' will be 'struct_new.pdb'. You can specify a new name with the keyword 'new_pdb_name'
* by default, the output directory will be the same as that for the input pdb. This can be changed with the 'output_directory' keyword
* the program will renumber atoms starting from 1 (using hex for atom numbers greater than 99999), using a dictionary to change order if a csv dictionary of "old,new" indexes is specified with 'atom_reorder_old_new_file'
* the program will renumber molecules starting from 1 if 'mol_renum_flag = True' is included in the configuration file. Molecules may also be renumbered with by specifying a csv dictionary of "old,new" indexes with 'mol_renum_old_new_file'

psf_edit
Currently only has limited functionality:
* Can be used to renumber residues/molecules starting from 1 using "mol_renum_flag = True" (no reordering of atoms)
* can map old molecule numbers to new ones by specifying a mapping dictionary with "mol_renum_old_new_file" (no reordering of atoms)
* Mapping of old atom numbers to new ones is not fully implemented.
* there is no option to reorder the psf
* the current most useful part of psf_edit is to help prepare files for CP2K, by specifying residue IDs that will be included in a qm region, i.e. "resids_qmmm_ca_cb_link = 1,5"
* note: to do so, it uses a default dictionary that can map between CHARMM atom types and elements, and between CHARMM atom types and MM_KIND radii (radii for water and hydronium from http://pubs.acs.org/doi/abs/10.1021/ct6001169; all other radii from http://xlink.rsc.org/?DOI=b801115j). If a mapping is needed that is not in the default dictionaries, the program will print a warning and exit. Users may supply their own dictionaries with the "atom_type_element_dict_file" and "atom_type_radius_dict_file"
* it assumes that all residues will be broken between the CA and CB atoms (if they exist), with all backbone atoms outside the QM region (types [CA, C, O, NT, HNT, CAT, HT1, HT2, HT3, HA, CAY, HY1, HY2, HY3, CY, OY, N, HN]); a different exclude list can be specified with 'exclude_atom_types_from_QM'
* it will output an "amino_id.dat" file that lists the atom ids (numbering from 1) for each element in the QM region from the non-excluded residue/molecule atoms. It will also print a link section noting the break between teh CA and CB atoms, and capping with H
* it will output an "mm_kinds.dat" file that notes the radius for each atom type found in the psf (see above to specify the dictionary to use)
* it will print a "vmd_protein_atoms.dat" file that lists the indices (atom_num - 1) of each atom in the QM region (useful for a VMD script that is part of converting RAPTOR output to VMD input)


For RAPTOR or EVBFit/RMDFit:
----------------------------

convert_cp2k_forces
cp2k force output files

evb_get_info
collects selected data form evb output files such as the number of states, the maximum ci^2 value for a protonated
state, and the max ci^2 value for a deprotonated state

fitevb_setup
provided a allowable ranges of parameters and results of a previous fitting step, creates a new fitevb input file

process_cv_file
converts plumed cv output to evb cv output style





History
-------

0.9.2 (2016-09-07)
------------------

* Added functions to help select configurations based on a criteria (such as CV value) by adding a "bin_settings" option to filter_col
* added a "comb_col" script to combine two column values with optional "prefix", "suffix", and "middle"

0.9.1 (2016-07-20)
------------------

* Updated pdb_edit to fill in the element column of a pdb file (sometime VMD removes it). The program uses
a dictionary of atom_type,element based on CHARMM36. Users can specify a different mapping of PDB atom types to
element types.

0.9.0 (2016-07-17)
------------------

* Now uses the csv.QUOTE_NONNUMERIC option for writing and reading, to automatically convert to floats

0.8.2 (2016-06-22)
------------------

* Updated data_edit so that output is formatted into columns. Additionally, when comparing data files, lines that
differ only by more a specified tolerance (~1e-5) are now ignored.

0.8.1 (2016-06-14)
------------------

* Now correctly sorts differences between data files, ignoring comments, unimportant ordering...

0.8.0
------------------

* Added capabilty to data_edit: now will show differences between data files, ignoring comments, unimportant ordering...


0.7.3 (2016-04-20)
------------------

* Changed data_reorder to data_edit, reflecting its additional capabilities.

0.7.2 (2016-04-14)
------------------

* Added data2data, data_reorder to distribution

0.7.1 (2016-03-31)
------------------

* Added dump_edit, pdb_edit, evbdump2data to distribution; these allow changes to, for example, the atom numbers


0.7.0 (2016-02-20)
------------------

* Added lammps_proc_data.py to calculate g(r) from a lammps dump file

0.6.0 (2015-11-22)
------------------

* Added lammps_dist to calculate atom pair distances as found in a LAMMPS dump file.


0.5.0 (2015-11-21)
------------------

* Added press_dups to compress CSV rows that have duplicate values for a given column.

0.4.2 (2015-10-24)
------------------

* Added path_bin for creating an averaged VMD file from a larger set of coordinates.

0.4.1 (2015-10-10)
------------------

* Added max_loc and max_val columns to calc_pka output.
* Fixed problem with 0-start files in fes_combo.

0.4.0 (2015-10-01)
------------------

* Changed `wham_rad` to set the zero point at the the largest CV free E, rather than the max CV

0.3.0 (2015-09-07)
------------------

* Added `wham_block.py`
* Added `wham_split.py`
* Refactored common code to `common` and `wham` modules.

0.2.0 (2015-09-03)
------------------

* Added `wham_rad.py`

0.1.0 (2015-09-01)
------------------

* Renamed to `md_utils`
* Added `fes_combo.py`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_utils-0.10.0.tar.gz (116.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page