Skip to main content

Computing contact map and Solvent Accessibility for protein structures

Project description

pcmap : A python module to compute contact map of proteins

pcmap is a PYTHON 3.X library designed to compute pairwise amino acid contacts and residues Solvant Accessible Surface Area in protein stuctures. Structures must be provided as PDB coordinates or MDAnalysis trajectory files. Contacts are computed inside a single PDB file or across two PDB files structures. The library can compute one to thousands sets of contacts. Results are produced in JSON format and contacts are encoded in a simple dictionary structure described in the OUTPUT section.

Contact Map release candidate

This is the JOSS release version of the pcmap package with all contact map only features

Installation

pip install pcmap

or install the package in edition mode, if required

git clone pcmap
pip install -e pcmap

Dependencies

pcmap use the ccmap package to compute contact maps. This package is a C extension currently available for the following architectures:

  • Python3.9 to Python3.14 Linux
  • Python3.9 to Python3.14 MacOS/ARM

Data and testing

The installation folder provides a data folder which stores the necessary elements for testing.

.
+-- README.md
+-- data
    +-- 1A2K_r_u.pdb
    +-- 1A2K_l_u.pdb
    +-- 1A2K_transformations_sample.json

Where,

  • 1A2K_r_u.pdb is a single chain protein, later refered as the RECEPTOR.
  • 1A2K_r_u.pdb is a single chain protein, later refered as the LIGAND.
  • 1A2K_transformations_sample.json is a set of transformations described in terms of rotations and translations of the LIGAND protein.

CLI executable

Once installed, the module can be called as an executable.

Straight contact map computation on structures

One or several PDB coordinates can be passed to the cli to compute single-body or two-body contact map. Each PDB file defines one body, even if it features many polypeptidic chains.

Computing one-body contact map

This will compute the pairwise amino acid contact within the molecule.

Single one-body contact map

Just pass the name of the PDB file to the single command.

pcmap-monomer data/1A2K_r_u.pdb

Single two-body contact map

Just pass the name of two PDB files to the dimer command.

pcmap-dimer data/1A2K_l_u.pdb data/1A2K_r_u.pdb

Many one-body contact maps

Create a file containing the list of protein as a text file with one PDB file per line:

sample.lst

data/1A2K_r_u.pdb
data/1A2K_l_u.pdb

And pass it to the many command along with a result file name. pcmap-many sample.lst output.json

Computing many two-body contact map

Again, pass a file containing the list of proteins to compute in tabulated format with two PDB files per line:

sample_dimer.lst

data/1A2K_r_u.pdb   data/1A2K_l_u.pdb
data/1A2K_r_u.pdb   data/1A2K_l_u.pdb

And pass it to the cli

pcmap-many sample_dimer.lst output.json

Two-body contact map: applying transformation prior to computation

When dealing with a two body system, it is often convenient to provide the initial conformation of the bodies along with transformations to be applied to generate specific conformations. It is customary to call the first PDB file the RECEPTOR and the second one the LIGAND. The transformation are intended to be applied to LIGAND coordinates, the RECEPTOR remaining unchanged. The transformations are described in terms of rotations and translations of the LIGAND coordinates, using Euler's angles and translation vectors. For mathematical simplicity, LIGAND and RECEPTOR structures can be centered to the origin of the coordinate system.

Single two-body contact map

As an example consider the following command:

pcmap-dimer-z data/1A2K_r_u.pdb data/1A2K_l_u.pdb\
--euler -1.961,2.066,-2.354 --trl 7.199,16.800,28.799\
--ctr1 -27.553,-8.229,-80.604 --ctr2 -67.006,0.11,-77.27
  1. 1A2K_r_u.pdb coordinates will be centered onto the origin by the translation [-27.553,-8.229,-80.604].
  2. 1A2K_l_u.pdb coordinates will be centered onto the origin by the translation [-67.006,0.11,-77.27].
  3. 1A2K_l_u.pdb will then be rotated according to [-1.961,2.066,-2.354] Euler's angles.
  4. 1A2K_l_u.pdb will then be translated by [7.199,16.800,28.799] to obtain the actual conformation.
  5. A contact map will be computed across the two structures
Obtaining the three dimensional coordinates of the transformed complex

Use the --apply flag to generate the corresponding PDB records. They will be stored into new_receptor.pdb and new_ligand.pdb files.

Many two-body contact maps

When needed, several contact map can be computed by applying a sequence of transformations to the provided RECEPTOR and LIGAND PDB files. Transformations should be described in a JSON file format such as in the following example describing two transformations.

{
    "euler" : [ [-1.96, 2.07, -2.35], [-0.70, 0.95, -0.53] ] ,
    "translation" : [ [7.2, 16.8, 28.8], [21.6, -7.2, -20.4] ],
   "recOffset": [-27.6,-8.2,-80.6],
   "ligOffset": [-67.1,0.1, -77.3],
}

Where,

  • euler references a list of the α, β, γ Euler angles. Each triplet is transformation specific.
  • translation references a list of x,y,z components of one translation vector. It is transformation specific.
  • recOffset is the translation vector centering the receptor to coordinates origin. It is common to all transformations.
  • ligOffset is the translation vector centering the ligand to coordinates origin. It is common to all transformations.

A file example is joined as data/1A2K_transformations_sample.json

CLI options

--apply

Usable on Single two-body contact map mode, dumps LIGAND transformed coordinates into a PDB file named new_ligand.pdb (originale RECEPTOR molecule gets written in new_receptor.pdb).

--dist [default=4.5]

Define the maximal pairwise distance between two heavy atoms to register amino acids contact.

--encode [default=False]

If True, contacts are returned as integers. Each integer encoding one pair of atoms/residues positions in contact with this simple formula.

--atomic [default=False]

If True, all atomic contacts are reported.

--rich [default=False]

If True, add cartesian coordinates to contact map, only compatible with one single body atomic computation

PYTHON module

The library can be used as a Python module to assemble the pipeline/program of your choice. First we need to import it.

import pcmap

The pcmap modules exposes the two following functions:

  • contactMap
  • contactMapThroughTransform

The contactMap API

contactMap(proteinA, proteinB=None, **kwargs)

The type of the positional parameters controls the function behaviour.

First parameter can be a PDB file OR a list of PDB files. Second parameter is optional and can also be a PDB file OR a list of PDB files.

Provided with a PDB file as single parameters:

Compute the internal amino acid contact map of the structure

Provided with another PDB file as optional second parameters:

Compute the amino acid contactmap between the two structures

Provided with a list of PDB files as single parameter:

Compute the internal amino acid contact maps of each structure indvidually.

Provided with a list of PDB files as second parameter:

Compute the amino acid contact map of each pair of structures at identical positions accross the two lists.

Examples

# This will compute an internal contact map
c2 = pcmap.contactMap("data/1A2K_r_u.pdb")
# This will compute the contact map across the two structures
c1 = pcmap.contactMap("data/1A2K_r_u.pdb", "data/1A2K_l_u.pdb")

The contactMapThroughTransform API

def contactMapThroughTransform(proteinA,\
    proteinB,\
    eulers, translations,\
    offsetRec,\
    offsetLig,\
    **kwargs):

Computes several contact map accross two provided proteins through the applications of provided transormations. Transformations are applied to the SECOND structure.

The eulers and translations allow to pass even-sized lists of rotation and translation vectors which will be applied to the second structure to generate dimeric conformations.

The offsetRec parameter allows to pass a single translation vector to center the first structure barycenter onto the origin.

The offsetLig parameter allows to pass a single translation vector to center the second structure barycenter onto the origin.

module API options

Both contactMapand contactMapThroughTransform functions share the same set of named parameters:

  • dist: contact threshold distance[default 4.5]
  • encode: integer contact encoding[default=False]
  • threadNum: maximal number of allowed threads[default=8]
  • deserialize: get results as python dictionary if True, string otherwise [default=True]

OUTPUT

Amino acid are ranked according to their residue number and chain identifier in the PDB record. Contacts are registred in a one-versus-many fashion: one "root" residue, many "partners" residue, the one residue having the lowest rank. This ensures that parwise contact are registred only once. The corresponding JSON format is the following:

{'type': 'contactList',
 'data': [ 
        {'root': {'resID': '2 ', 'chainID': 'A'},
            'partners': [ {'resID': '7 ', 'chainID': 'A'},
                        {'resID': '74 ', 'chainID': 'A'}
                    ]
        },
        {'root': {'resID': '9 ', 'chainID': 'A'},
        'partners': [ {'resID': '77 ', 'chainID': 'A'},
                        {'resID': '78 ', 'chainID': 'A'}
                    ]
        }
    ]
}

In this example, the residue 2 and 9 of chain A respectively form contacts with residues 7,74 and 77,78 of the same chain.

Enriched atomic contact map output

In addition to their names and contact distances, the cartesian coordinates of atoms in contacts can be obtained by passing the enrich parameter with a True value.

This option is only avaible for the computation of one single body contact map. As an example, consider the following call,

contactMap("data/nyxB_monomerB.pdb", atomic=True, enrich=True)

which will return (only a sample is shown here):

{'type': 'atomic_rich',
 'data': [
    (('N', 'ALA', '16 ', 'B', 29.392, -64.804, 30.479),
     ('CA', 'ALA', '16 ', 'B', 30.18, -65.518, 29.475),
    1.46),
    (('N', 'ALA', '16 ', 'B', 29.392, -64.804, 30.479),
    ('N', 'PHE', '17 ', 'B', 30.994, -63.28, 28.948),
    2.69),
    (('CA', 'ALA', '16 ', 'B', 30.18, -65.518, 29.475),
    ('N', 'ALA', '16 ', 'B', 29.392, -64.804, 30.479),
    1.46)
    ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pcmap-2.0.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pcmap-2.0.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file pcmap-2.0.0.tar.gz.

File metadata

  • Download URL: pcmap-2.0.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for pcmap-2.0.0.tar.gz
Algorithm Hash digest
SHA256 de431656ef02d5025e79e9f6452affca7936a34daa7b54bd5e8b6041167baebd
MD5 5b64207b4e619612de87666c886c5fb2
BLAKE2b-256 edbb20a0f61085802a889068138506bd149d616cc153bfbdc413fbde7de035df

See more details on using hashes here.

File details

Details for the file pcmap-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: pcmap-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for pcmap-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7e16f0185b0027d69d690e69c9d2f0abe85c43a62e910fe34b866dd4b666b82
MD5 9c2c178a28b296d2163e39428ddfea81
BLAKE2b-256 a256d523424697543d980ac0ade9511d6eec35fa3dbaa7eec9c7e1503530f9cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page