Skip to main content

Spatial analysis of protein structures

Project description

Faltwerk

faltwerk is a library for spatial exploratory data analysis of protein structures. It helps parse them, select items of interest, generate and visualise various protein annotations, and then provide convenient interfaces for downstream tools to run, for example, spatial regression. The most convenient way to run faltwerk is in a jupyter notebook. This REPL-like execution gives rapid feedback and can help form hypotheses. However, faltwerk also integrates nicely into workflows (no point and click required).

Antifreeze protein (PDB 3OTM)

The curious thing about proteins is that they come as linear strings of residues, but then fold up into 3D structures, that represent the functionally active form of the molecule (the thing that "does stuff"). Correspondingly, nature most often selects on structure. However, many analyses (positive selection etc.) require the linear sequence. faltwerk bridges this gap (see specific use cases below).

PRs and suggestions welcome! The awesome Anvio has a structure module, should you be dissatisfied with faltwerk.

Install

Quick start using the colab notebook.

chmod +x install.sh
./install.sh
# ... or just follow the steps therein manually

# Get Pfam database (Version v31 -- this matters!)
wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.dat.gz
wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.gz
gunzip -q Pfam*
hmmpress Pfam-A.hmm

Now you should be able to run the local notebook:

jupyter notebook examples/example_local.ipynb

Usage

We assume that all protein structures contain a single, and if you are not analysing protein complexes, then you want the structures to only contain a single chain, also. A simple way to clean a query pdb file is through pdb-tools (http://www.bonvinlab.org/pdb-tools/). AlphaFold2 predictions as generated by ColabFold are ideal.

To get an overview of what you can do with faltwerk and to interact with the data and code, we provide a notebook:

jupyter notebook example.ipynb

To give an idea of what it looks like to use faltwerk:

# Load data
from faltwerk.models import Fold, AlphaFold
# ... (for details see notebook)

fp = 'data/alphafold2/transferrin/test_08df6_unrelaxed_rank_1_model_3.pdb'
model = Fold(fp)

# or
fp = 'data/alphafold2/transferrin/'
af = AlphaFold(fp)
model = af.best

# Visualise pLDDT (how good is AF2 the prediction)
ly = Layout(model).geom_ribbon('plddt', palette='rainbow_r')

# Predict ligand binding sites using the "InteracDome" approach
b = Binding(model, 'representable')
b.predict_binding_(pfam)
binding = b.get_binding('PF00405.16', 'FE')
fe = [i for i, j in enumerate(binding) if j > .5]

ly = Layout(model)
# select
fe_ = ly.select(residues=fe)
# style
ly.geom_ribbon(color='#ffffff')
ly.geom_ribbon(selection=fe_, color='red')
ly.render().show()

# Test for spatial signal in residues that are e. g. mutated or under positive
# natural selection. Here we use residues from Barber et al., Science, 2014
# (https://www.science.org/doi/10.1126/science.1259329) that are under positive
# selection.
original = [153, 253, 382, 434, 435, 436, 439, 558, 574, 575, 576, 591, 592, 593, 614, 617, 619, 625]

# -1 bc/ positions from manuscript are 1-based 
barber2014 = [i-1 for i in original]
selection = [1 if i in barber2014 else 0 for i in range(len(model))]

# (1) Spatial autocorrelation, i. e. "hotspots" in selected features, here using
# the Getis-Ord metric.
FDR = 0.05
hotspots = find_hotspots(
    model,
    selection,
    method='getis_ord',
    angstrom=8,
    false_discovery_rate=FDR,
    test_two_sided=False)

# (2) Point density analysis, here using HDBSCAN.
clusters = cluster(model, hotspots, min_cluster_size=5)

# Annotate model
model.annotate_many_({
    'selection': selection,
    'hotspots': hotspots,
    'clusters': clusters})

# Build figure like a layer cake
ly = Layout(model, panel_size=(200, 200), grid=(1, 3), linked=True)

pos = ly.select(residues=barber2014, elements=['CA'], chain='A')

ly.geom_ribbon(color='#ffffff')
ly.geom_sphere(selection=pos, color='black')
ly.geom_surface('hotspots', palette='binary', panel=(0, 1))
ly.geom_surface('clusters', palette='Set2_r', panel=(0, 2))
ly.render().show()

From here, faltwerk allows to easily interact with downstream tools like altair for visualisation of features or pysal for spatial regression (see notebook example.ipynb)

df = pd.DataFrame.from_dict(
    flatten(model.annotation, expected_track_length=len(model)))

Use cases

  • Annotate solvent access, active centers and more
  • Some regions in the amino acid sequence are more conserved than others across species. Which regions in the 3D structure do they correspond to?
  • Are there any significant spatial hotspots where residues experience positive selection?
  • Do observed mutations cluster in any part of the protein, for example in regions that interface other proteins (protein binding sites) or active sites of the protein?
  • A hotspot/ cluster has been identified; which protein features if any can explain this occurance. For example, is the cluster associated with known protein-binding sites?
  • What's the spatial relationship between mutations deemed pathogenic and the annotated functional domains of a protein?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faltwerk-0.2.36.tar.gz (11.3 MB view details)

Uploaded Source

Built Distribution

faltwerk-0.2.36-py3-none-any.whl (11.5 MB view details)

Uploaded Python 3

File details

Details for the file faltwerk-0.2.36.tar.gz.

File metadata

  • Download URL: faltwerk-0.2.36.tar.gz
  • Upload date:
  • Size: 11.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for faltwerk-0.2.36.tar.gz
Algorithm Hash digest
SHA256 8928a1faf7086d3a7d0b015411df417ad3b473e7c46eb4ff146007b8d4a3347f
MD5 dd0d0ef658ea40eff9f13dffc8c51165
BLAKE2b-256 41ac906a1500f396c4aa8b8444e36d07db909d26395b48f9ab1f5ef26735d381

See more details on using hashes here.

File details

Details for the file faltwerk-0.2.36-py3-none-any.whl.

File metadata

  • Download URL: faltwerk-0.2.36-py3-none-any.whl
  • Upload date:
  • Size: 11.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for faltwerk-0.2.36-py3-none-any.whl
Algorithm Hash digest
SHA256 9dfe014619486d17326239b05a426d4a9c0da6c680e56fb058726edeb5b3682f
MD5 512ecf6318113a828fbb63288f02bcc8
BLAKE2b-256 7371950e8da6990e7b69a23deadd5c3d81e4a2f2d2daddfbd9a32f953b8a6e04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page