Skip to main content

this is a program for fast protein structure search

Project description

ADAMS: Align Distance Matrix with SIFT algorithm enables GPU-Accelerated protein structre comparison

Requirements

opencv == 4.7.0.72

numpy >= 1.17.2

cuda > 11.x

cupy-cuda111 == 12.2.0 or same as cuda version

biopython == 1.81

scipy == 1.11.2

tqdm == 4.66.1

cuda == 11.x or same as cupy version

pickle

Installation

pip install adams

Please contact: guozy23@mails.tsinghua.edu.cn for more information

Tutorial and description

Introduction

We've developed a method to address the issue of numerous proteins exhibiting high structural similarity despite having no sequence similarities. This problem has become increasingly critical as Alphafold2 continues to predict new structures, resulting in a massive database (23TiB ver 4) that lacks an effective data mining tool.

Foldseek offers a solution by embedding local structure into the sequence and transforming this issue into a sequence alignment problem. It's significantly faster than DALI, TM-Align, and CE-Align and outperforms them on structure comparison benchmarks.

However, according to the Foldseek paper, we observed that Foldseek occasionally underperforms compared to DALI, indicating that some 'overall information' may not be captured within local structure embedding.

Our Align Distance Matrix with SIFT algorithm (ADAMS) is similar to DALI but uses an enhanced version of the renowned computer vision algorithm - Scale Invariant Feature Transform (SIFT). It extracts key features from protein distance matrices at different scales and compares their similarities. Most calculations can benefit from GPU acceleration. This zero-shot model enables more precise structure comparisons at speeds comparable to Foldseek-TM tools. Users can create their own pdb databases on PCs for all-vs-all comparisons with increased speed and reduced memory usage (approximately 500MB - 3GB GPU memory for a 20000 all vs all comparison).

The algorithm is illustrated in Fig.1: The original SIFT algorithm is applied on distance matrixes to extract detectable features across various scales. These features are represented as 128-dimension vectors which are then stacked into an n X 128 matrix for comparison between two structures using cosine similarity calculated between two feature matrices by A X B.T operation. Given these features have nearly identical lengths (512 ± 1.5), feature distances are determined by angles rather than length differences between them; thus when normalized beforehand, similarity calculation becomes straightforward on GPUs.

image

The performance metrics are as follows - it took between 3-4 seconds to search for the protein structure 'OSM-3' (699aa) within a C.elegans protein structure database (19361 structures) using an Nvidia RTX2080Ti (11GiB) GPU. When loading the entire database onto the dataset, total GPU memory usage was around 4000MB. However, when loaded separately, it only consumed about 500MB of memory. Importantly, these different methods did not impact search speed.

pre-print paper is here: https://www.biorxiv.org/content/10.1101/2023.11.14.566990v1.article-metrics

Tutorial

Installation
pip install adams
1. Download a pdb set and make it a cuda_database or a compatible one'
import adams
from adams.db_maker import *
db = DatabaseMaker(device=0, process=40) # use GPU-0,40*1.5 process.
db.make('./pdb','./pdb_db') # put your pdb dataset in one folder and make your database in another one
2. Match your protein structure to different databases
import adams
from adams.matcher import ADAMS_match
matcher = ADAMS_match('./protein.pdb',gpu_usage=[0,1],threshold=0.95)#use gpu0 and gpu1
result = matcher.match('./pdb_db','tmp',prefilter_threshold = 0.01) # search similar protein structure from a database, return a pandas dataframe. A temp folder is needed, will be created if not exist.

Firstly check the compare_all.py script:

compare_all.py

if permission denied

chmod +x path/to/compare_all.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adams-1.0.1.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

adams-1.0.1-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file adams-1.0.1.tar.gz.

File metadata

  • Download URL: adams-1.0.1.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for adams-1.0.1.tar.gz
Algorithm Hash digest
SHA256 00c3831975c13e9d10c64e725ca5e548d440133b2ae9fc8179fdc7af6a87c731
MD5 477b9f1482dc60db9543f75695a787e0
BLAKE2b-256 f1ab30bb3ee22d622dda40c77f196779ef19e810d15cde7ae26b12bdf1507386

See more details on using hashes here.

File details

Details for the file adams-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: adams-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for adams-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 638ece632f28f30b9c452c7e237afb79b98b964d6d18d67e87ed2b6fada13749
MD5 ce88cbdc8b7e143ef1478cc2997fa665
BLAKE2b-256 243e0a987355c9a49942a5182b3281f2cb30be8d859bc53832a41c1437373c3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page