Paper - Pytorch

These details have not been verified by PyPI

Project links

Project description

AlphaFold3

Implementation of Alpha Fold 3 from the paper: "Accurate structure prediction of biomolecular interactions with AlphaFold3" in PyTorch

install

$ pip install alphafold3

Input Tensor Size Example

import torch

# Define the batch size, number of nodes, and number of features
batch_size = 1
num_nodes = 5
num_features = 64

# Generate random pair representations using torch.randn
# Shape: (batch_size, num_nodes, num_nodes, num_features)
pair_representations = torch.randn(
    batch_size, num_nodes, num_nodes, num_features
)

# Generate random single representations using torch.randn
# Shape: (batch_size, num_nodes, num_features)
single_representations = torch.randn(
    batch_size, num_nodes, num_features
)

Genetic Diffusion

Need review but basically it operates on atomic coordinates.

import torch
from alphafold3.diffusion import GeneticDiffusion

# Create an instance of the GeneticDiffusionModuleBlock
model = GeneticDiffusion(channels=3, training=True)

# Generate random input coordinates
input_coords = torch.randn(10, 100, 100, 3)

# Generate random ground truth coordinates
ground_truth = torch.randn(10, 100, 100, 3)

# Pass the input coordinates and ground truth coordinates through the model
output_coords, loss = model(input_coords, ground_truth)

# Print the output coordinates
print(output_coords)

# Print the loss value
print(loss)

Full Model Example Forward pass

import torch 
from alphafold3 import AlphaFold3

# Create random tensors
x = torch.randn(1, 5, 5, 64)  # Shape: (batch_size, seq_len, seq_len, dim)
y = torch.randn(1, 5, 64)  # Shape: (batch_size, seq_len, dim)

# Initialize AlphaFold3 model
model = AlphaFold3(
    dim=64,
    seq_len=5,
    heads=8,
    dim_head=64,
    attn_dropout=0.0,
    ff_dropout=0.0,
    global_column_attn=False,
    pair_former_depth=48,
    num_diffusion_steps=1000,
    diffusion_depth=30,
)

# Forward pass through the model
output = model(x, y)

# Print the shape of the output tensor
print(output.shape)

Citation

@article{Abramson2024-fj,
  title    = "Accurate structure prediction of biomolecular interactions with
              {AlphaFold} 3",
  author   = "Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans,
              Richard and Green, Tim and Pritzel, Alexander and Ronneberger,
              Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick,
              Joshua and Bodenstein, Sebastian W and Evans, David A and Hung,
              Chia-Chun and O'Neill, Michael and Reiman, David and
              Tunyasuvunakool, Kathryn and Wu, Zachary and {\v Z}emgulyt{\.e},
              Akvil{\.e} and Arvaniti, Eirini and Beattie, Charles and
              Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and
              Congreve, Miles and Cowen-Rivers, Alexander I and Cowie, Andrew
              and Figurnov, Michael and Fuchs, Fabian B and Gladman, Hannah and
              Jain, Rishub and Khan, Yousuf A and Low, Caroline M R and Perlin,
              Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and
              Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine
              and Yakneen, Sergei and Zhong, Ellen D and Zielinski, Michal and
              {\v Z}{\'\i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet
              and Jaderberg, Max and Hassabis, Demis and Jumper, John M",
  journal  = "Nature",
  month    =  may,
  year     =  2024
}

Notes

-> pairwise representation -> explicit atomic positions

-> within the trunk, msa processing is de emphasized with a simpler MSA block, 4 blocks

-> msa processing -> pair weighted averaging

-> pairformer: replaces evoformer, operates on pair representation and single representation

-> pairformer 48 blocks

-> pair and single representation together with the input representation are passed to the diffusion module

-> diffusion takes in 3 tensors [pair, single representation, with new pairformer representation]

-> diffusion module operates directory on raw atom coordinates

-> standard diffusion approach, model is trained to receiev noised atomic coordinates then predict the true coordinates

-> the network learns protein structure at a variety of length scales where the denoising task at small noise emphasizes large scale structure of the system.

-> at inference time, random noise is sampled and then recurrently denoised to produce a final structure

-> diffusion module produces a distribution of answers

-> for each answer the local structure will be sharply defined

-> diffusion models are prone to hallucination where the model may hallucinate plausible looking structures

-> to counteract hallucination, they use a novel cross distillation method where they enrich the training data with alphafold multimer v2.3 predicted strutctures.

-> confidence measures predicts the atom level and pairwise errors in final structures, this is done by regressing the error in the outut of the structure mdule in training,

-> Utilizes diffusion rollout procedure for the full structure generation during training ( using a larger step suze than normal)

-> diffused predicted structure is used to permute the ground truth and ligands to compute metrics to train the confidence head.

-> confidence head uses the pairwise representation to predict the lddt (pddt) and a predicted aligned error matrix as used in alphafold 2 as well as distance error matrix which is the error in the distance matrix of the predicted structure as compared to the true structure

-> confidence measures also preduct atom level and pairwise errors

-> early stopping using a weighted average of all above metic

-> af3 can predict srtructures from input polymer sequences, rediue modifications, ligand smiles

-> uses structures below 1000 residues

-> alphafold3 is able to predict protein nuclear structures with thousnads of residues

-> Covalent modifications (bonded ligands, glycosylation, and modified protein residues and 202 nucleic acid bases) are also accurately predicted by AF

-> distills alphafold2 preductions

-> key problem in protein structure prediction is they predict static structures and not the dynamical behavior

-> multiple random seeds for either the diffusion head or network does not product an approximation of the solution ensenble

-> in future: generate large number of predictions and rank them

-> inference: top confidence sample from 5 seed runs and 5 diffusion samples per model seed for a total of 25 samples

-> interface accuracy via interface lddt which is calculated from distances netween atoms across different chains in the interface

-> uses a lddt to polymer metric which considers differences from each atom of a entity to any c or c1 polymer atom within aradius

Todo

Model Architecture

Implement input Embedder from Alphafold2 openfold implementation LINK
Implement the template module from openfold LINK
Implement the MSA embedding from openfold LINK
Fix residuals and make sure pair representation and generated output goes into the diffusion model
Implement reclying to fix residuals

Training pipeline

Get all datasets pushed to huggingface

Resources

Datasets

Smaller, start here

Benchmarks

RoseTTAFold(https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/0)

Related Projects

NeuroFold

Tools

Community

Agora

Books

Thinking in Systems

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.8

May 15, 2024

0.0.6

May 15, 2024

0.0.4

May 9, 2024

0.0.2

May 9, 2024

0.0.1

May 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphafold3-0.0.8.tar.gz (17.3 kB view details)

Uploaded May 15, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alphafold3-0.0.8-py3-none-any.whl (14.8 kB view details)

Uploaded May 15, 2024 Python 3

File details

Details for the file alphafold3-0.0.8.tar.gz.

File metadata

Download URL: alphafold3-0.0.8.tar.gz
Upload date: May 15, 2024
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/23.3.0

File hashes

Hashes for alphafold3-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`d7bb0a0a5e2caf274b045ebbebc5f60aae4fa09b0681e7223bcfa46736e0f71d`
MD5	`45879626a1a610c4cd6b45a5086a7c3e`
BLAKE2b-256	`fc7ee283c96aa538fa44ac6c1fbc4ab76759834da938004d859a1f30ccd0dd59`

See more details on using hashes here.

File details

Details for the file alphafold3-0.0.8-py3-none-any.whl.

File metadata

Download URL: alphafold3-0.0.8-py3-none-any.whl
Upload date: May 15, 2024
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/23.3.0

File hashes

Hashes for alphafold3-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd195e7eadb339758b2278b103f9abb2539786481ceed24d7d8e5a32650e35cb`
MD5	`815047133ac47231f861f6b12f2fb16d`
BLAKE2b-256	`15909ebfc2c6a9e1019a0fa12d69ff6446509f95bb05d6e2860382e936a1fd7c`

See more details on using hashes here.

alphafold3 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AlphaFold3

install

Input Tensor Size Example

Genetic Diffusion

Full Model Example Forward pass

Citation

Notes

Todo

Model Architecture

Training pipeline

Resources

Datasets

Benchmarks

Related Projects

Tools

Community

Books

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes