Skip to main content

A lite version of CG2AT for functionality with CCD2MD.

Project description

CG2AT2 — Lite Version

Citation: Vickery ON, Stansfeld PJ. CG2AT2: an Enhanced Fragment-Based Approach for Serial Multi-scale Molecular Dynamics Simulations. J Chem Theory Comput. 2021;17(10):6472–6482. doi:10.1021/acs.jctc.1c00295


Table of Contents

  1. Overview
  2. Installation
  3. Requirements
  4. Quick Start
  5. Flags Reference
  6. How It Works
  7. Input
  8. Output Structure
  9. Output Conversion Modes
  10. Detailed Flag Guide
  11. Database Structure
  12. Adding Fragments to the Database

Overview

CG2AT2 converts Coarse-Grained (CG) molecular systems (e.g. Martini) back to atomistic resolution using a fragment-based fitting approach. It is designed to require minimal input from the user — in most cases only the CG coordinate file and the original atomistic protein structure are needed, and CG2AT2 produces all files required to run further atomistic simulations.

CG2AT2 workflow


Installation

CG2AT2 requires no compilation. Download it directly from GitHub and run it immediately.

Recommended: add CG2AT2 to your PATH for convenience by appending the following to your ~/.profile or ~/.bashrc:

export PATH="/path/to/CG2AT2:$PATH"

Requirements

Category Requirement
Python v3.0 or higher
GROMACS v5 or higher
NumPy any recent version
SciPy any recent version

All other dependencies (argparse, copy, glob, math, multiprocessing, os, pathlib, re, shutil, subprocess, sys, time, etc.) are included in the Python standard library.

Note: distutils is no longer required as of this release.


Quick Start

Minimal run (CG structure only):

python cg2at.py -c cg_input.gro

Recommended run (with original atomistic structure for improved quality):

python cg2at.py -c cg_input.gro -a atomistic_input.gro

Fully automated run (no interactive prompts):

python cg2at.py -c cg_input.gro -a atomistic_input.gro \
    -w tip3p -fg martini_2-2_charmm36 -ff charmm36-jul2022

Flags Reference

Required

Flag Type Description
-c pdb/gro/tpr Coarse-grained input coordinates

Optional

Flag Type Default Description
-a pdb/gro/tpr Atomistic input coordinates
-d 0:2 1:3 Duplicate atomistic chains (chain:copies)
-loc str CG2AT_<timestamp> Output folder name
-o all/align/de_novo/none all Output conversion mode
-group 0,1 2,3 / all / chain Rigid-body fitting groups
-ncpus int 8 (or max) Number of CPU cores
-mod flag off Treat fragments individually (disable grouping)
-sf float 0.9 Fragment scale factor before fitting
-cys float 7.0 Disulphide bond search cutoff (Å)
-ter flag off Interactively choose terminal species
-nt flag off Neutral N-terminus on all chains
-ct flag off Neutral C-terminus on all chains
-vs flag off Apply virtual sites to protein
-swap str Swap residues or beads during conversion
-box float float float New box dimensions in Å (0 = keep original)
-w str Water model (e.g. tip3p, spce)
-ff str Force field (e.g. charmm36-jul2022)
-fg str Fragment library (e.g. martini_2-2_charmm36)
-gmx str gmx GROMACS executable name
-messy flag off Keep all temporary files
-silent flag off Auto-accept disulphide bond suggestions
-disre flag off Disable backbone distance restraint matrix
-ov float 0.3 Minimum allowed atom overlap (Å)
-posre str Generate position restraint file for a residue
-compare str Compare an .itp file against the database
-info flag off Print available force fields and fragments
-version flag off Print CG2AT2 version
-v flag (stackable) off Increase output verbosity (up to -vvv)
-h flag Show help and exit

How It Works

CG2AT2 uses a fragment-based fitting workflow. Each fragment is placed and rotated independently with no knowledge of neighbouring beads:

Fragment fitting diagram

  1. Centre each fragment at the centre of mass (COM) of its heavy atoms.
  2. Rotate the fragment to minimise the distance between atoms that connect to adjacent beads.
  3. Minimise each residue individually.
  4. Merge all residues and minimise the complete system.
  5. Check for threaded lipids (bonds > 0.2 nm) and correct them.
  6. Run NVT equilibration (where applicable).
  7. Morph the protein to the user-supplied atomistic structure via steered MD (where applicable).

Threaded Lipid Correction

Lipid tails can accidentally thread through aromatic residues during CG-to-AT conversion. CG2AT2 detects this by checking all bond lengths in the minimised system. Any bond exceeding 0.2 nm is treated as a threading event — the offending atoms are corrected and the system is re-minimised.

Threaded lipid correction


Input

CG2AT2 supports two rebuilding strategies:

  • De novo: Fragments are fitted to the CG bead positions from scratch.
  • Flexible fitting: A user-supplied atomistic structure is sequence-aligned to the CG system and then morphed into place via steered MD. This produces the highest quality output and should be used whenever the original atomistic structure is available.

If only a partial atomistic structure is supplied (e.g. the main protein but not a flexible linker), CG2AT2 uses the de novo method to build in the missing regions and seamlessly combines the two.

Hybrid de novo + flexible fitting

Multiple atomistic input files are accepted:

python cg2at.py -c cg_input.gro -a chain_A.pdb chain_B.pdb

Output Structure

CG2AT_<timestamp>/
├── INPUT/
│   ├── CG_INPUT.pdb          # CG structure converted to PDB
│   ├── AT_INPUT_X.pdb        # Supplied atomistic structure(s) as PDB
│   └── script_inputs.dat     # All flags used, saved for reproducibility
│
├── <RESIDUE_TYPE>/           # One folder per residue type (e.g. PROTEIN, POPE)
│   ├── *.pdb                 # Individually converted residues
│   ├── *.top / *.itp         # Residue topologies
│   ├── *.mdp                 # Minimisation run files
│   ├── gromacs_outputs       # Full GROMACS log
│   └── MIN/                  # Minimised residue PDBs
│
├── MERGED/
│   ├── merged_cg2at_*.pdb    # All residue types merged into one PDB
│   ├── *.top / *.itp         # Combined topology
│   ├── MIN/                  # Merged minimisation files
│   ├── NVT/                  # NVT equilibration files
│   └── STEER/                # Steered MD alignment files
│
└── FINAL/
    ├── <forcefield>.ff/      # Force field directory
    ├── *.top / *.itp         # Final topology files
    ├── final_cg2at_de_novo.pdb     # De novo output
    ├── final_cg2at_aligned.pdb     # Aligned output (if applicable)
    ├── script_timings.dat    # Per-stage runtime breakdown
    └── RMSD data             # RMSD between CG and atomistic output

Output Conversion Modes

Select the output mode with -o. The default is all.

Mode NVT Steered MD Output file(s)
none final_cg2at_de_novo.pdb
de_novo ✓ (5 ps) final_cg2at_de_novo.pdb
align final_cg2at_aligned.pdb
all ✓ (5 ps) both of the above

All modes start with fragment fitting, minimisation, and threaded-lipid correction.


Detailed Flag Guide

Disulphide Bonds

CG2AT2 searches for disulphide bonds in both the user atomistic structure (S–S < 2.1 Å) and the CG representation (SC1–SC1 < 7 Å, more than 4 residues apart). When a bond is found only in the CG structure it will prompt for confirmation.

-silent          # Auto-accept all disulphide suggestions without prompting
-cys 10.0        # Expand search radius to 10 Å for stretched martini bonds

If you see a topology/atom-count mismatch that is off by exactly 2, it is almost always an undetected disulphide bond — try increasing -cys.


Residue and Bead Swapping

The -swap flag allows simple mutations or residue substitutions during conversion, without editing the input file. The general syntax is:

-swap FROM_residue,bead:TO_residue,bead:resid_range
Use case Example
Swap all ASP to ASN -swap ASP:ASN
Swap ASP to ASN in resid range 0–10 and 30–40 -swap ASP:ASN:0-10,30-40
Swap residues with different bead names -swap POPC,NC3:POPG,GL0
Swap beads within the same residue -swap POPG,D2B:POPG,C2B
Skip a bead -swap GLU,SC2:ASP,skip
Skip an entire residue -swap POPG:skip
Multiple swaps at once -swap POPE,NH3:POPG,GL0 POPG,GL0:POPE,NH3
Skip specific NA+ ions by resid -swap NA+:skip:4000-4100

Correcting residue name truncation: PDB and GRO formats truncate residue names to four characters, which causes ambiguity with closely related residues such as phosphoinositides. CG2AT2 internally supports longer names. For example, the three PI headgroup variants are stored as POPI1_3, POPI1_4, and POPI1_5, and a four-character POPI bead in the input can be redirected:

-swap POPI:POPI1_4

Protein Chain Fitting

For multimeric proteins, CG2AT2 offers several rigid-body fitting strategies:

# Default: fit each atomistic chain to its CG counterpart independently
python cg2at.py -c cg.gro -a protein.pdb

# Fit by CG chain (one group per CG chain)
python cg2at.py -c cg.gro -a protein.pdb -group chain

# Fit the entire atomistic structure as a single rigid body
python cg2at.py -c cg.gro -a protein.pdb -group all

# Fit specific chains as groups (chains 0+2 together, chains 1+3 together)
python cg2at.py -c cg.gro -a protein.pdb -group 0,2 1,3

Chain Duplication

For homomeric complexes only one chain needs to be supplied; the -d flag copies it:

# Use 3 copies of chain 0 and 2 copies of chain 1
python cg2at.py -c cg.gro -a monomer.pdb -d 0:3 1:2

Terminal Residues

CG2AT2 uses charged termini by default (most force fields require this). Override with:

-nt       # Neutral N-terminus on all chains
-ct       # Neutral C-terminus on all chains
-ter      # Interactive selection per chain

Fragment Scale Factor

Fragments are shrunk to 90% of their original size before fitting, reducing atom clashes. Decrease further if minimisation fails:

-sf 0.8   # Shrink to 80%

Fragment Grouping

By default, adjacent beads within a residue are fitted as a group, improving geometry (especially for sugars). To disable:

-mod      # Treat every bead fragment independently

Grouping vs modular fitting


PBC Box Resizing

Box resizing is currently supported for cubic boxes only. Dimensions are in Ångströms; use 0 to preserve the original value on a given axis:

-box 100 100 100    # Set all axes to 100 Å
-box 0 0 100        # Shrink only the Z-axis to 100 Å

Distance Restraint Matrix

When an atomistic structure is supplied, CG2AT2 generates a backbone hydrogen-bond distance restraint matrix that is applied during NVT. This refines backbone atom orientations with minimal effect on overall structure. Disable with:

-disre

Atom Overlap Checker

Atoms closer than 0.3 Å will cause minimisation to fail. Increase the cutoff if needed:

-ov 0.5   # Allow up to 0.5 Å overlap

Resuming a Failed Run

CG2AT2 skips any step for which output files already exist, so a failed run can be resumed after fixing the error simply by re-running the same command with the same -loc directory:

-loc CG2AT   # Use a fixed directory name instead of a timestamp

Miscellaneous

-gmx gmx_mod     # Use a specific GROMACS binary
-messy           # Keep all temporary files
-vs              # Apply virtual sites to protein hydrogen atoms
-ncpus 4         # Limit to 4 CPU cores (default: 8 or system max)
-v / -vv / -vvv  # Increase verbosity
-info            # List available force fields and fragment libraries
-info -fg martini_2-2_charmm36   # List residues in a specific library
-posre POPE      # Generate a position restraint file for POPE
-compare martini_v3.itp   # Compare an itp against the database

Database Structure

database/
├── forcefields/
│   └── charmm36.ff/          # Standard GROMACS force field directory
│
└── fragments/
    └── <forcefield_type>/    # e.g. martini_2-2_charmm36
        ├── protein/
        │   └── <AA>/         # e.g. ASP, PHE
        │       ├── <AA>.pdb  # Fragment (bead sections)
        │       └── <AA>.top  # Topology (optional)
        ├── protein_modified/
        │   └── <residue>/    # e.g. CYSD
        ├── non_protein/
        │   └── <residue>/    # e.g. POPC
        │       ├── <RES>.pdb
        │       ├── <RES>.itp
        │       └── <RES>.top (optional)
        ├── solvent/
        │   └── <bead>/       # e.g. W
        │       └── TIP3P.pdb, SPCE.pdb, ...
        ├── ions/
        │   └── <ion>/        # e.g. NA+
        └── other/
            └── <residue>/    # e.g. DA (DNA)

Tip: Prefix any file or folder name with _ to prevent CG2AT2 from reading it.


Adding Fragments to the Database

Fragment PDB Format

Each fragment PDB uses [ bead_name ] section headers followed by standard ATOM records:

[ BB ]
ATOM      1  N   PHE     1      42.030  16.760  10.920  2.00  0.00           N
ATOM      2  CA  PHE     1      42.770  17.920  11.410  3.00  1.00           C
ATOM     10  C   PHE     1      44.240  17.600  11.550  2.00  0.00           C
ATOM     11  O   PHE     1      44.640  16.530  12.080  6.00  0.00           O
[ SC1 ]
ATOM      3  CB  PHE     1      42.220  18.360  12.800  1.00  1.00           C
...
  • Protein amino acids should not include adjustable hydrogens — these are added automatically by pdb2gmx.
  • Modified protein and non-protein fragments should include all hydrogens, as they are not processed by pdb2gmx.

Optional Topology File

A .top file of the same name can provide connectivity, grouping, chirality, and hydration information:

[ CONNECT ]
# bead_1  atom_1  bead_2  direction
   BB       N      BB        -1
   BB       C      BB         1

[ GROUPS ]
SC1 SC2 SC3

[ CHIRAL ]
# central  move  atom1  atom2  atom3
   CA       HA    CB     N      C

[ HYDRATION ]
W

Solvent Fragments

Each solvent model is a separate PDB file inside the bead folder (e.g. solvent/W/TIP3P.pdb). A single CG water bead can expand to multiple atomistic molecules:

[ TIP3P ]
ATOM      1  OW  SOL     1      20.910  21.130  75.300  1.00  0.00
ATOM      2  HW1 SOL     1      20.580  21.660  76.020  1.00  0.00
ATOM      3  HW2 SOL     1      21.640  21.640  74.940  1.00  0.00
ATOM      4  OW  SOL     2      21.000  22.960  77.110  1.00  0.00
...

Ion Fragments

Each ion is stored as a single-atom fragment. The [ HYDRATION ] topology entry controls which solvent bead is overlaid to model the first hydration shell:

[ NA+ ]
ATOM      1  NA   NA     1      21.863  22.075  76.118  1.00  0.00
[ HYDRATION ]
W

Generating Position Restraint Files

CG2AT2 can generate a position restraint .itp for any non-protein residue in the database:

python cg2at.py -posre POPE -fg martini_2-2_charmm36

This creates POPE_posre.itp and adds the corresponding #ifdef NP block to the residue's .itp file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cg2at_lite-0.3.1.tar.gz (16.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cg2at_lite-0.3.1-py3-none-any.whl (16.6 MB view details)

Uploaded Python 3

File details

Details for the file cg2at_lite-0.3.1.tar.gz.

File metadata

  • Download URL: cg2at_lite-0.3.1.tar.gz
  • Upload date:
  • Size: 16.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cg2at_lite-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d26462987050a5764e135569fd24c442648d63c111e76796d123986d0edf9ca1
MD5 575ff542ef7cd6ae284d065eba74a50e
BLAKE2b-256 8b486984f8d67f3345e169559e012d2869f7400546ebd34b04a323bd0badb409

See more details on using hashes here.

File details

Details for the file cg2at_lite-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cg2at_lite-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cg2at_lite-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 495668deaf134d4862aef2c74163abcfb3404fd9d2b04a1debc6dca4c35f6fb0
MD5 1e2c09dfba09c6dec8295686be2139d2
BLAKE2b-256 b01b6d1b2c90a46bd2a656b748be2c50e2ec743d4eddd83ac79ace8951d2987d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page