Skip to main content

A package for making graph representations of proteinstructures.

Project description

protein-graph

Computes a molecular graph for protein structures.

why?

Proteins fold into 3D structures, and have a natural graph representation: amino acids are nodes, and biochemical interactions are edges.

I wrote this package as part of a larger effort to do graph convolutional neural networks on protein structures (represented as graphs). However, that's not the only thing I can foresee doing with this.

One may be interested in the topology of proteins across species and over evolutionary time. This package can aid in answering this question.

how do I install this package?

Currently only pip-installable:

$ pip install proteingraph

how do I use this package?

This package assumes that you have a standard protein structure file (e.g. a PDB file). This may be a file generated after solving the NMR or crystal structure of a protein, or it may be generated from homology modelling.

Once that has been generated, the molecular graph can be generated using Python code.

from proteingraph import read_pdb

p = read_pdb('my_model.pdb')

The object that is returned is a NetworkX Graph object, which means all of the graph theoretic methods in there are available.

converting graphs to tensors

To convert the graph into tensors for use as inputs to graph neural networks, there are three functions provided.

Here's how they are used, starting first with converting node metadata to matrices:

from proteingraph.conversion import (
    generate_feature_dataframe,
    format_adjacency,
    generate_adjacency_tensor
)

# You provide a collection of functions
# that take in the node name and metadata dictionary,
# and return a pandas Series:
def my_func(n, d):
    return pd.Series({"key_name": d["key_name"]}, name=n)

def my_func2(n, d):
    return pd.Series(..., name=n)

def myfunc3(n, d):
    return pd.Series(..., name=n)

# If you have a function that depends on outside information,
# be sure to scope the variables correctly
# or use functools.partial to help:
from functools import partial

@partial(argname=some_variable)
def myfunc4(n, d, argname):
    return pd.Series(..., name=n)

# seriously though, give the functions more informative names!

funcs = [
    my_func,
    my_func2,
    my_func3,
]

# Now get a pandas DataFrame version of the tensor
feats = generate_feature_dataframe(p, funcs=funcs)
# You can also return a NumPy array version:
F = generate_feature_dataframe(p, funcs=funcs, return_array=True)

# Same goes for adjacency matrices, or even adjacency tensors!
# To facilitate return as XArray DataArrays (for inspection),
# we provide a `format_adjacency` function.
def my_adj_func(G):
    adj = some_func(G)
    return format_adjacency(G, adj, "xarray_coord_name")

def my_adj_func2(G):
    adj = some_func2(G)
    return format_adjacency(G, adj, "another_xarray_coord_name")

funcs = [
    my_adj_func,
    my_adj_func2,
]

# Now, generate the xarray adjacency tensor
adj_da = generate_adjacency_tensor(G, funcs)
# You can also generate a NumPy array version:
A = generate_adjacency_tensor(G, funcs, return_array=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteingraph-0.3.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

proteingraph-0.3.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file proteingraph-0.3.0.tar.gz.

File metadata

  • Download URL: proteingraph-0.3.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for proteingraph-0.3.0.tar.gz
Algorithm Hash digest
SHA256 57b32488cf8db3e8e3531d0284db5b45a24f27fcc13c26e0fb1ad2fb32827958
MD5 f9436e3a07fc51f8dbdddd50ab7ae9cf
BLAKE2b-256 b2f1f56f974ba88c805f02b3dfc9aa634e4f4811aff48dd0ee5c7607d3e3a3aa

See more details on using hashes here.

File details

Details for the file proteingraph-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: proteingraph-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for proteingraph-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce0ce0c91c3d39621779bed006bff0577d66b58ee7bcc816b36e5682f0cc7d4f
MD5 457406accc0462bca2a3f2b1887c987a
BLAKE2b-256 0251e39f7a01860e9761880f64bbcd3e3a2c95c287bd4dea5215a1e5b41e00f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page