Skip to main content

Protein & Interactomic Graph Construction for Machine Learning

Project description

Binder PyPI version Docs DOI:10.1101/2020.07.15.204701 Project Status: Active – The project has reached a stable, usable state and is being actively developed. CodeFactor Quality Gate Status Bugs Maintainability Rating Reliability Rating Gitter chat License: MIT Code style: black banner



Documentation | Paper | Tutorials | Installation

Protein & Interactomic Graph Library

This package provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks. We provide compatibility with standard PyData formats, as well as graph objects designed for ease of use with popular deep learning libraries.

What's New?

Example usage

Creating a Protein Graph

Tutorial (Residue-level) Open In Colab | Tutorial - Atomic Open In Colab(https://colab.research.google.com/assets/colab-badge.svg) | Docs

from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph

config = ProteinGraphConfig()
g = construct_graph(config=config, pdb_code="3eiy")

Creating a Protein Graph from the AlphaFold Protein Structure Database

Open In Colab Tutorial | Docs

from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph
from graphein.protein.utils import download_alphafold_structure

config = ProteinGraphConfig()
fp = download_alphafold_structure("Q5VSL9", aligned_score=False)
g = construct_graph(config=config, pdb_path=fp)

Creating a Protein Mesh

Tutorial | Docs

from graphein.protein.config import ProteinMeshConfig
from graphein.protein.meshes import create_mesh

verts, faces, aux = create_mesh(pdb_code="3eiy", config=config)

Creating an RNA Graph

Tutorial | Docs

from graphein.rna.graphs import construct_rna_graph
# Build the graph from a dotbracket & optional sequence
rna = construct_rna_graph(dotbracket='..(((((..(((...)))..)))))...',
                          sequence='UUGGAGUACACAACCUGUACACUCUUUC')

Creating a Protein-Protein Interaction Graph

Tutorial | Docs

from graphein.ppi.config import PPIGraphConfig
from graphein.ppi.graphs import compute_ppi_graph
from graphein.ppi.edges import add_string_edges, add_biogrid_edges

config = PPIGraphConfig()
protein_list = ["CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB"]

g = compute_ppi_graph(config=config,
                      protein_list=protein_list,
                      edge_construction_funcs=[add_string_edges, add_biogrid_edges]
                     )

Creating a Gene Regulatory Network Graph

Tutorial | Docs

from graphein.grn.config import GRNGraphConfig
from graphein.grn.graphs import compute_grn_graph
from graphein.grn.edges import add_regnetwork_edges, add_trrust_edges

config = GRNGraphConfig()
gene_list = ["AATF", "MYC", "USF1", "SP1", "TP53", "DUSP1"]

g = compute_grn_graph(
    gene_list=gene_list,
    edge_construction_funcs=[
        partial(add_trrust_edges, trrust_filtering_funcs=config.trrust_config.filtering_functions),
        partial(add_regnetwork_edges, regnetwork_filtering_funcs=config.regnetwork_config.filtering_functions),
    ],
)

Installation

Pip

The simplest install is via pip. N.B this does not install ML/DL libraries which are required for conversion to their data formats and for generating protein structure meshes with PyTorch 3D. [Further details]

pip install graphein # For base install
pip install graphein[extras] # For additional featurisation dependencies
pip install graphein[dev] # For dev dependencies
pip install graphein[all] # To get the lot

However, there are a number of (optional) utilities (DSSP, PyMol, GetContacts) that are not available via PyPI:

conda install -c salilab dssp # Required for computing secondary structural features
conda install -c schrodinger pymol # Required for PyMol visualisations & mesh generation

# GetContacts - used as an alternative way to compute intramolecular interactions
conda install -c conda-forge vmd-python
git clone https://github.com/getcontacts/getcontacts

# Add folder to PATH
echo "export PATH=\$PATH:`pwd`/getcontacts" >> ~/.bashrc
source ~/.bashrc
To test the installation, run:

cd getcontacts/example/5xnd
get_dynamic_contacts.py --topology 5xnd_topology.pdb \
                        --trajectory 5xnd_trajectory.dcd \
                        --itypes hb \
                        --output 5xnd_hbonds.tsv

Conda environment

The dev environment includes GPU Builds (CUDA 11.1) for each of the deep learning libraries integrated into graphein.

git clone https://www.github.com/a-r-j/graphein
cd graphein
conda env create -f environment-dev.yml
pip install -e .

A lighter install can be performed with:

git clone https://www.github.com/a-r-j/graphein
cd graphein
conda env create -f environment.yml
pip install -e .

Dockerfile

We also provide a Dockerfile

Citing Graphein

Please consider citing graphein if it proves useful in your work.

@article{Jamasb2020,
  doi = {10.1101/2020.07.15.204701},
  url = {https://doi.org/10.1101/2020.07.15.204701},
  year = {2020},
  month = jul,
  publisher = {Cold Spring Harbor Laboratory},
  author = {Arian Rokkum Jamasb and Pietro Lio and Tom Blundell},
  title = {Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphein-1.0.5.tar.gz (104.9 kB view details)

Uploaded Source

File details

Details for the file graphein-1.0.5.tar.gz.

File metadata

  • Download URL: graphein-1.0.5.tar.gz
  • Upload date:
  • Size: 104.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for graphein-1.0.5.tar.gz
Algorithm Hash digest
SHA256 d0b0aa1143a160ac5453ab23a8eb616a63673168256c34f49f7d532b4d68f1ff
MD5 622e6c41fdcc2a39ee9f9ae0e170f6b1
BLAKE2b-256 7cd252f3e53b623ec02094c2c4f9f6709b653af510fceb8c3afa5dc856c877e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page