Skip to main content

MetAromatic algorithm

Project description

Met-aromatic

Code for the following publications:

  • Weber, D. S.; Warren, J. J. The Interaction between Methionine and Two Aromatic Amino Acids Is an Abundant and Multifunctional Motif in Proteins. Arch. Biochem. Biophys. 2019, 672, 108053.
  • Weber, D. S.; Warren, J. J. A Survey of Methionine-Aromatic Interaction Geometries in the Oxidoreductase Class of Enzymes: What Could Met-Aromatic Interactions be Doing Near Metal Sites? J. Inorg. Biochem. 2018, 186, 34-41.

Table of Contents

Synopsis

This program returns a list of closely spaced methionine-aromatic residue pairs for structures in the Protein Data Bank (PDB). The program supports running queries on single PDB entries or large scale multithreaded batch jobs consisting of hundreds of thousands of queries.

How it works

Step 1: Data preprocessing

Files in the PDB are organized using the SMCRA hierarchy: Structure, Model, Chain, Residue and Atom. For example, here is the fifth line for the entry 1RCY:

1         2  3   4   5   6     7         8      9       10   11              12
ATOM      5  CB  THR A   5     -13.081   2.366  23.788  1.00 37.95           C

Each line in a file corresponds to a single atom. In the above example, we have a carbon atom that is labelled CB (column 3) on a threonine residue (THR, column 4) located on the A chain (column 5). Columns 7-9 specify the $x$, $y$, $z$ coordinates of the carbon atom.

The Met-aromatic program starts by downloading a *.pdb file from the PDB over FTP. The file is then stripped down to the subset of coordinates corresponding to a chain [A-Z] of choosing. Most PDB entries consist of A and B chains. The program then strips the dataset down to the residues tyrosine (TYR), tryptophan (TRP), phenylalanine (PHE), and methionine (MET). In the last step of preprocessing, the program further strips the dataset down to any of the following atoms:

MET: CE, SD, CG
TYR: CD1, CE1, CZ, CG, CD2, CE2
TRP: CD2, CE3, CZ2, CH2, CZ3, CE2
PHE: CD1, CE1, CZ, CG, CD2, CE2

The above atoms are the aromatic carbon atoms in the aromatic residues tyrosine (TYR), tryptophan, (TRP) and phenylalanine (PHE).

Step 2: The distance condition

The program applies the distance condition to find methionine-aromatic pairs that are physically near each other. To do so, the program first finds the midpoints between all neighbouring aromatic carbon atoms in all of tyrosine, tryptophan and phenylalanine. These midpoints are denoted using a *:

TYR: CD1*|CE1*|CZ*|CG*|CD2*|CE2*
TRP: CD2*|CE3*|CZ2*|CH2*|CZ3*|CE2*
PHE: CD1*|CE1*|CZ*|CG*|CD2*|CE2*

Next, the program finds all the vectors projecting from all methionine SD atoms to all the midpoints *. As an example, a substructure consisting of one methionine and two phenylalanines would have 12 possible such vectors, depicted here as $\vec{v}$, owing to the fact there can exist a total of six midpoints between neighbouring atoms in a hexagonal arrangement of atoms and there are two such arrangements for two phenylalanine residues.

To apply the distance condition, the program simply banks those methionine-aromatic pairs where one or more vectors $\vec{v}$ are less than or equal to some cutoff $c$. As a motivating example, CD1* in TYR is the midpoint between the CD1 and CE1 carbon atoms. A CD1* / SD on an aromatic / methionine pair would meet the distance condition if the following held:

$$ \lVert \vec{v} \rVert = \sqrt{(CD1_x * -SD_x)^2 + (CD1_y * -SD_y)^2 + (CD1_z * -SD_z)^2} \leq c $$

Step 3: The angular condition

Any methionine-aromatic pairs meeting the distance condition are subjected to the angular condition. The angular condition can be loosely interpreted as "find all the methionine-aromatic pairs where the methionine lone pairs are pointing into or near the region(s) of highest electron density on a corresponding aromatic moiety." To apply the angular condition, two new vectors are introducted: vector $\vec{a}$ and vector $\vec{g}$, which describe the orientation of the SD lone pairs in three dimensional space. To find the lone pairs, the program considers the SD atom in a methionine atom to be the center of a regular tetrahedron with vertices CE and CG. Solving for the position of the remaining two vertices yields vectors $\vec{a}$ and $\vec{g}$. Next, the program obtains the angles between the lone pairs and $\vec{v}$:

$$ \theta = \cos^{-1}\left (\frac{\lVert \vec{a} \rVert \lVert \vec{v} \rVert}{\vec{a}\cdot\vec{v}} \right ) $$

$$ \phi = \cos^{-1}\left (\frac{\lVert \vec{g} \rVert \lVert \vec{v} \rVert}{\vec{g}\cdot\vec{v}} \right ) $$

Last, a methionine-aromatic pair is deemed interacting if any of $\theta$ or $\phi$ is less than or equal to some cutoff angle $\delta$, that is, if:

$$ \theta \leq \delta \vee \phi \leq \delta $$

holds.

Summary

The end result is a dataset consisting of methionine-aromatic pairs whereby one or both of the methionine lone pairs are pointing into or near the region of highest electron density on the corresponding aromatic residues. A representative figure is shown below:

Setup

Simply run:

pip install MetAromatic

Basic usage

Finding Met-aromatic pairs

The easiest means of performing Met-aromatic calculations is to run jobs in a terminal session. The simplest query follows:

runner pair 1rcy

Here, the pair argument specifies that we want to run a single aromatic interaction calculation. The query will yield the following results:

-------------------------------------------------------------------
ARO        POS        MET POS    NORM       MET-THETA  MET-PHI
-------------------------------------------------------------------
TYR        122        18         4.211      75.766     64.317
TYR        122        18         3.954      60.145     68.352
TYR        122        18         4.051      47.198     85.151
TYR        122        18         4.39       53.4       95.487
TYR        122        18         4.62       68.452     90.771
TYR        122        18         4.537      78.568     76.406
PHE        54         148        4.777      105.947    143.022
PHE        54         148        4.61       93.382     156.922
PHE        54         148        4.756      93.287     154.63
-------------------------------------------------------------------

Above we have an order VI interaction between TYR 122 and MET 18, that is, all six vectors $\vec{v}$ projecting from the SD on MET 18 to the midpoints on TYR 122 meet Met-aromatic criteria. We also have an order IV interaction between PHE 54 and MET 148. The NORM column specifies the actual distance (in $\overset{\circ}{\mathrm {A}}$) between the MET residue and one of the midpoints between two carbon atoms in an aromatic ring, or $\lVert v \rVert$. The default cutoff $c$ was applied in the above example, at 4.9 $\overset{\circ}{\mathrm {A}}$. The cutoff can be adjusted, however, using the --cutoff-distance option:

runner --cutoff-distance 4.0 pair 1rcy

Reducing the cutoff distance yields an order I interaction between TYR 122 and MET 18.

-------------------------------------------------------------------
ARO        POS        MET POS    NORM       MET-THETA  MET-PHI
-------------------------------------------------------------------
TYR        122        18         3.954      60.145     68.352
-------------------------------------------------------------------

MET-THETA and MET-PHI refer to $\theta$ and $\phi$, respectively. In the above example, the default cutoff angle $\delta$ is used ( $109.5^\circ$ ). The cutoff angle can be adjusted by using the --cutoff-angle option:

runner --cutoff-distance 4.5 --cutoff-angle 60 pair 1rcy

The --cutoff-angle option ensures that at least one of $\theta$ or $\phi$ angles fall below the cutoff $\delta$. This is seen in the below order II interaction:

-------------------------------------------------------------------
ARO        POS        MET POS    NORM       MET-THETA  MET-PHI
-------------------------------------------------------------------
TYR        122        18         4.051      47.198     85.151
TYR        122        18         4.39       53.4       95.487
-------------------------------------------------------------------

The default lone pair interpolation model is cp or Cross Product (a thorough description is available in my master's thesis: Applications of numerical linear algebra to protein structural analysis: the case of methionine-aromatic motifs). There exists another model, rm or Rodrigues Method for predicting the positions of lone pairs. This model is based on the Rodrigues' Rotation Formula. The model type can be passed as follows:

runner --cutoff-distance 4.5 --cutoff-angle 60 --model rm pair 1rcy

Which yields similar results:

-------------------------------------------------------------------
ARO        POS        MET POS    NORM       MET-THETA  MET-PHI
-------------------------------------------------------------------
TYR        122        18         3.954      57.237     64.226
TYR        122        18         4.051      45.097     80.768
TYR        122        18         4.39       52.505     91.841
-------------------------------------------------------------------

Note that the Euclidean distances between TYR aromatic carbon atoms and MET remain unchanged. By default, this program searches for "A" delimited chains. Some researchers may, however, be interested in searching for aromatic interactions in a different chain within a multichain protein. The --chain option can be used to specify the chain:

runner --cutoff-distance 4.5 --cutoff-angle 60 --model rm --chain B pair 1rcy

In this case, no results are returned because the PDB entry 1rcy does not contain a "B" chain.

Finding "bridging interactions"

Bridging interactions are interactions whereby two or more aromatic residues meet the criteria of the Met-aromatic algorithm, for example, in the example below (PDB entry 6C8A):

We can specify a search for bridging interactions, instead of conventional aromatic interactions, using the bridge argument. For example, to search for bridging interactions with a 7.0 $\overset{\circ}{\mathrm {A}}$ $\lVert v \rVert$ cutoff in 6LU7:

runner --cutoff-distance 7.0 bridge 6lu7

Which will return a list as follows:

-------------------------------------------------------------------
{MET264}-{PHE219}-{TYR209}
{MET276}-{TRP207}-{TRP218}
{PHE185}-{PHE181}-{MET165}
-------------------------------------------------------------------

Where each row corresponds to a bridge. This program treats bridging interactions as networks with a defined set of vertices. For example, the above examples are 2-bridges with 3 vertices: ARO - MET - ARO. The --vertices option can be passed to search for n-bridges:

runner --cutoff-distance 6.0 bridge 6lu7 --vertices 4

Batch jobs and MongoDB integration

A brief overview

This software is normally used for large scale Protein Data Bank mining efforts and stores the results in a MongoDB database (https://www.mongodb.com/). To start a batch job, ensure that the host is running a valid MongoDB installation then supply a batch file. A batch file can be a regular text file consisting of delimited PDB codes:

1xak, 1uw7, 2ca1
1qz8, 2fxp, 2fyg
2cme, 6mwm, spam

The command follows:

runner batch </path/batch/file> --threads <num-threads> --database <db> --collection <collection>

The MongoDB dump database is specified using the --database option. The collection is specified with the --collection option. The --threads option specifies how many threads to use for processing the batch. The recommended number of threads is 12 on a 300 Mbps network and on a machine that is idle. By default, mining jobs are run on localhost and on port 27017. A "healthy" batch job will log as follows:

1970-01-01T00:00:00 MainThread I Handshaking with MongoDB at "mongodb://localhost:27017/"
1970-01-01T00:00:00 MainThread I Imported pdb codes from file /tmp/foo.txt
1970-01-01T00:00:00 Batch_0 I Getting Met-aromatic interactions for PDB entry 1xak
1970-01-01T00:00:00 Batch_1 I Getting Met-aromatic interactions for PDB entry 1uw7
1970-01-01T00:00:00 Batch_2 I Getting Met-aromatic interactions for PDB entry 2ca1
1970-01-01T00:00:00 Batch_0 E No methionine residues found for entry "1xak"
1970-01-01T00:00:00 Batch_0 I Processed 1xak. Count: 1
1970-01-01T00:00:00 Batch_0 I Getting Met-aromatic interactions for PDB entry 1qz8
1970-01-01T00:00:00 Batch_2 I Processed 2ca1. Count: 2
1970-01-01T00:00:00 Batch_2 I Getting Met-aromatic interactions for PDB entry 2fyg
1970-01-01T00:00:00 Batch_1 I Processed 1uw7. Count: 3
1970-01-01T00:00:00 Batch_1 I Getting Met-aromatic interactions for PDB entry 2fxp
1970-01-01T00:00:01 Batch_2 I Processed 2fyg. Count: 4
1970-01-01T00:00:01 Batch_2 I Getting Met-aromatic interactions for PDB entry 1rcy
1970-01-01T00:00:01 Batch_0 I Processed 1qz8. Count: 5
1970-01-01T00:00:01 Batch_0 I Getting Met-aromatic interactions for PDB entry 2cme
1970-01-01T00:00:01 Batch_1 E No methionine residues found for entry "2fxp"
1970-01-01T00:00:01 Batch_1 I Processed 2fxp. Count: 6
1970-01-01T00:00:01 Batch_1 I Getting Met-aromatic interactions for PDB entry 6mwm
1970-01-01T00:00:02 Batch_2 I Processed 1rcy. Count: 7
1970-01-01T00:00:02 Batch_0 E Found no Met-aromatic interactions for entry "2cme"
1970-01-01T00:00:02 Batch_0 I Processed 2cme. Count: 8
1970-01-01T00:00:02 Batch_1 E No aromatic residues found for entry "6mwm"
1970-01-01T00:00:02 Batch_1 I Processed 6mwm. Count: 9
1970-01-01T00:00:02 MainThread I Batch job complete!
1970-01-01T00:00:02 MainThread I Results loaded into database: default_ma
1970-01-01T00:00:02 MainThread I Results loaded into collection: default_ma
1970-01-01T00:00:02 MainThread I Batch job statistics loaded into collection: default_ma_info
1970-01-01T00:00:02 MainThread I Batch job execution time: 2.213000 s

A batch job will generate a collection secondary to the collection specified by --collection. This secondary collection will house all the batch job parameters and other statistics and the collection name will be suffixed with _info. For example, the above scenario would generate the default_ma_info collection:

{
        "_id" : ObjectId("6145d54c6f016f61e0afcaa5"),
        "num_workers" : 3,
        "cutoff_distance" : 4.9,
        "cutoff_angle" : 109.5,
        "chain" : "A",
        "model" : "cp",
        "data_acquisition_date" : ISODate("1970-01-01T00:00:02.077Z"),
        "batch_job_execution_time" : 2.213,
        "number_of_entries" : 9
}

Designing a distributed mining strategy

A user may be tempted to mine data on a high performance machine and then route the results to a storage server. This software supports this. As mentioned before, running a batch job will export data to a MongoDB server listening on localhost:27017 by default. However, the --uri option can be used to route results to another server. The --uri option accepts a valid MongoDB URI string and overrides both the host and port specified using the --host and --port options, respectively. For example, to run a batch job against a MongoDB server listening on localhost and port 27018, one would pass:

runner batch --uri=mongodb://localhost:27018/ </path/to/batch.txt>

To run a batch job against a server ma-results.local:27017, with MongoDB user "abc" and password "password," one would pass:

runner batch --uri=mongodb://abc:password@ma-results.local:27017/ </path/to/batch.txt>

More information regarding valid URI connection string formats can be found at Connection String URI Format — MongoDB Manual.

Using the MetAromatic API

One may be interested in extending the Met-aromatic project into a customized workflow. The instructions provided in the Setup section install MetAromatic source into site-packages. Therefore, the API can be exposed for use in a custom script.

Example: programmatically obtaining Met-aromatic pairs

The command:

runner --cutoff-distance=4.1 pair 1rcy

Will return:

-------------------------------------------------------------------
ARO        POS        MET POS    NORM       MET-THETA  MET-PHI
-------------------------------------------------------------------
TYR        122        18         3.954      60.145     68.352
TYR        122        18         4.051      47.198     85.151
-------------------------------------------------------------------

To programmatically emulate this command, the following script applies:

from json import dumps
from MetAromatic import MetAromatic

def main() -> None:
    arguments = {
        'cutoff_distance': 4.1,
        'cutoff_angle': 109.5,
        'chain': 'A',
        'model': 'cp'
    }

    results = MetAromatic(**arguments).get_met_aromatic_interactions('1rcy')

    for interaction in results.interactions:
        print(dumps(interaction, indent=4))

if __name__ == '__main__':
    main()

Which will print to stdout:

{
    "aromatic_residue": "TYR",
    "aromatic_position": 122,
    "methionine_position": 18,
    "norm": 3.954,
    "met_theta_angle": 60.145,
    "met_phi_angle": 68.352
}
{
    "aromatic_residue": "TYR",
    "aromatic_position": 122,
    "methionine_position": 18,
    "norm": 4.051,
    "met_theta_angle": 47.198,
    "met_phi_angle": 85.151
}

This output roughly matches that of the aforementioned runner invocation.

Example: programmatically obtaining bridging interactions

The command:

runner bridge 6lu7

Will return:

-------------------------------------------------------------------
{PHE134}-{TYR182}-{MET130}
-------------------------------------------------------------------

To programmatically emulate this command, the following script applies:

from MetAromatic import GetBridgingInteractions

def main() -> None:
    arguments = {
        'cutoff_distance': 4.9,
        'cutoff_angle': 109.5,
        'chain': 'A',
        'model': 'cp'
    }

    results = GetBridgingInteractions(arguments).get_bridging_interactions('6lu7', 3)
    print(results.bridges[0])

if __name__ == '__main__':
    main()

Running this script will return:

{'MET130', 'PHE134', 'TYR182'}

Which roughly matches the output of the aforementioned runner invocation.

Tests and automation

To test the program, run the following target:

make test

This target will run all unit tests within a nox generated virtual environment.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MetAromatic-1.1.0.tar.gz (39.6 kB view hashes)

Uploaded Source

Built Distribution

MetAromatic-1.1.0-py3-none-any.whl (31.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page