A toolkit for protein structure analysis and design.
Project description
Protein-Design Tools
Table of Contents
Overview
Protein-Design Tools is a Python library tailored for structural bioinformatics, with a specific focus on protein design and engineering. It provides a suite of tools for analyzing and manipulating protein structures, enabling researchers and practitioners to perform complex structural comparisons, design new proteins, and engineer existing ones with ease.
Whether you're conducting research in protein folding, designing novel enzymes, or engineering therapeutic proteins, Protein-Design Tools offers the functionalities you need to advance your projects.
Features
Protein Structure Representation
- Core Classes:
ProteinStructure: Represents the entire protein structure.Chain: Represents individual chains within the protein.Residue: Represents residues within chains.Atom: Represents individual atoms within residues.
- File Parsing:
- PDB Support: Parse and read PDB files seamlessly.
- CIF Support: Future support planned for CIF files.
- Programmatic Construction:
- Build idealized protein structures (e.g., alpha helices) programmatically.
Structural Metrics
Calculate structural metrics across multiple computational frameworks for flexibility and performance optimization:
- RMSD (Root Mean Square Deviation): Measure the average distance between atoms of superimposed proteins.
- TM-score: Assess structural similarity normalized by protein length.
- GDT-TS (Global Distance Test - Total Score): Evaluate global structural similarity using multiple distance thresholds.
- LDDT (Local Distance Difference Test): Measure local structural accuracy.
Utilities
- Radius of Gyration: Compute the radius of gyration for protein structures to assess compactness.
- Sequence Analysis: Extract and manipulate amino acid sequences from structures.
Input/Output Support
- File Operations:
- Read and write protein structures in PDB format.
- Write FASTA sequences derived from 3D structure files.
- Data Export:
- Export coordinates and other structural data in various formats, including HDF5.
Extensible Architecture
- Modular Design: Easily add new metrics, file formats, and functionalities without disrupting existing components.
- Multiple Frameworks: Leverage the strengths of NumPy, PyTorch, and JAX for computational tasks.
Installation
1. Choose the right requirements file
To keep the repo platform-agnostic, dependencies are split into small files in
requirements/. Pick the one that matches your hardware/accelerator:
| File | When to use it | Key extra deps |
|---|---|---|
requirements/cpu.txt |
CPU-only | jax[cpu] |
requirements/cuda12.txt |
NVIDIA GPU, CUDA 12 toolchain | jax[cuda12] (installs a CUDA-enabled jaxlib wheel) |
requirements/tpu.txt |
Google Cloud TPU VMs | jax[tpu] + libtpu link |
All three files include -r requirements/base.txt, which lists NumPy 1.26,
PyTorch ( CPU wheel by default ), FreeSASA, etc.
2. Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate.bat # Windows CMD
# .\.venv\Scripts\Activate.ps1 # Windows PowerShell
3. Install
CPU-only:
pip install -r requirements/cpu.txt
NVIDIA GPU:
pip install -r requirements/cuda12.txt
TPU VM:
pip install -r requirements/tpu.txt
4. Verify
import numpy, torch, jax, jaxlib, freesasa
print("NumPy:", numpy.__version__)
print("Torch:", torch.__version__, "| CUDA:", torch.cuda.is_available())
print("JAX :", jax.__version__, "| jaxlib:", jaxlib.__version__)
Quick Start
Here's a quick example to get you started with Protein-Design Tools:
from protein_design_tools.core.protein_structure import ProteinStructure
from protein_design_tools.io.pdb_io import read_pdb
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
from protein_design_tools.utils.coordinate_utils import get_coordinates, get_masses
# Reading a PDB file
protein = read_pdb("path/to/file.pdb", chains=['A', 'B'], name="Sample_Protein")
# Getting sequences
sequences = protein.get_sequence_dict()
print(sequences)
# Getting coordinates of all backbone atoms in chain A
coords = get_coordinates(protein, atom_type="backbone", chains={'A': range(1, 21)})
# Getting masses of all non-hydrogen atoms
masses = get_masses(protein, atom_type="non-hydrogen")
Several structural metrics are available, which are accessible across multiple computational frameworks
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
# Computing RMSD using NumPy
import numpy as np
import torch
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
rmsd = compute_rmsd_numpy(P, Q)
print(f"RMSD (NumPy): {rmsd:.4f}")
# Computing GDT-TS using PyTorch
P_pt = torch.tensor(P)
Q_pt = torch.tensor(Q)
gdt = compute_gdt_pytorch(P_pt, Q_pt)
print(f"GDT-TS (PyTorch): {gdt:.2f}")
Detailed Usage
Reading Protein Structures
Protein-Design Tools supports reading and parsing protein structures from PDB files. Future updates will include CIF file support.
from protein_design_tools.io.pdb_io import read_pdb
# Read all chains
protein = read_pdb("path/to/file.pdb")
# Read specific chains
protein = read_pdb("path/to/file.pdb", chains=['A', 'B'], name="My_Protein")
Analyzing Sequences
Extract amino acid sequences from the protein structure.
# Get the sequence of each chain in the protein
sequence_dict = protein.get_sequence_dict()
for chain_id, sequence in sequence_dict.items():
print(f"Chain {chain_id}: {sequence}")
Computing Structural Metrics
Leverage multiple frameworks to compute various structural metrics.
from protein_design_tools.metrics import compute_rmsd_numpy, compute_gdt_pytorch
# Example data
import numpy as np
import torch
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
# Compute RMSD using NumPy
rmsd = compute_rmsd_numpy(P, Q)
print(f"RMSD (NumPy): {rmsd:.4f}")
# Compute GDT-TS using PyTorch
P_pt = torch.tensor(P)
Q_pt = torch.tensor(Q)
gdt = compute_gdt_pytorch(P_pt, Q_pt)
print(f"GDT-TS (PyTorch): {gdt:.2f}")
Generating Idealized Structures
Create idealized protein structures programmatically, such as an alpha helix.
from protein_design_tools.io.builder import build_ideal_alpha_helix
# Build an idealized alpha helix with 10 residues
ideal_helix = build_ideal_alpha_helix(sequence_length=10, chain_id='A', start_res_seq=1)
# Display sequence
sequence_dict = ideal_helix.get_sequence_dict()
print(sequence_dict)
Examples
Calculating the Radius of Gyration
Calculate the radius of gyration for a protein and compare it to an idealized alpha helix.
from protein_design_tools.core.protein_structure import ProteinStructure
from protein_design_tools.io.pdb_io import read_pdb
from protein_design_tools.metrics import compute_radgyr, compute_radgyr_ratio
# Read the protein structure
protein = read_pdb("example.pdb")
# Display the amino acid sequence of the protein
sequence_dict = protein.get_sequence_dict()
for chain_id, sequence in sequence_dict.items():
print(f"Chain {chain_id}: {sequence}")
# Calculate the radius of gyration of the backbone of chain A
rgA = compute_radgyr(protein, chains={'A'}, atom_type="backbone")
print(f"Protein Structure Chain A Radius of Gyration: {rgA:.4f}")
# Calculate the radius of gyration of an ideal alanine helix
ideal_helix_seq_length = len(sequence_dict['A'])
rg_ideal_helix = compute_radgyr_alanine_helix(ideal_helix_seq_length, atom_type="backbone")
print(f"Ideal Alanine Helix Radius of Gyration: {rg_ideal_helix:.4f}")
# Calculate the radius of gyration ratio
rg_ratio = compute_radgyr_ratio(protein, chains={'A'}, atom_type="backbone")
print(f"Radius of Gyration Ratio: {rg_ratio:.4f}")
Comparing Protein Structures Using TM-score
Assess the structural similarity between two protein structures.
from protein_design_tools.metrics import compute_tmscore_numpy
# Assume P and Q are numpy arrays of shape (N, D) representing atom coordinates
P = np.random.rand(1000, 3)
Q = np.random.rand(1000, 3)
# Compute TM-score using NumPy
tm_score = compute_tmscore_numpy(P, Q)
print(f"TM-score (NumPy): {tm_score:.4f}")
Contributing
Contributions are welcome! Whether you're fixing bugs, improving documentation, or adding new features, your help is greatly appreciated.
- Fork the Repository: Click the "Fork" button at the top right of the repository page.
- Clone Your Fork:
git clone https://github.com/your-username/protein-design-tools.git
- Create a New Branch:
git checkout -b feature/YourFeatureName
- Make Your Changes: Implement your feature or fix.
- Commit Your Changes:
git commit -m "Add feature: YourFeatureName"
- Push to Your Fork:
git push origin feature/YourFeatureName
- Create a Pull Request: Go to the original repository and create a pull request from your fork.
For major changes, please open an issue first to discuss what you would like to change.
Development Guidelines
- Follow PEP8 style guidelines.
- Write clear and concise docstrings for all functions and classes.
- Include unit tests for new features or bug fixes.
- Ensure that existing tests pass before submitting a pull request.
License
This project is licensed under the MIT License.
Contact
For any questions, suggestions, or contributions, please reach out:
- Author: Andrew Schaub
- Linkedin: https://www.linkedin.com/in/andrewjschaub
- GitHub: https://github.com/drewschaub/protein-design-tools
Thank you for using Protein-Design Tools! We hope it serves as a valuable resource in your structural bioinformatics and protein engineering endeavors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protein_design_tools-0.1.31.tar.gz.
File metadata
- Download URL: protein_design_tools-0.1.31.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9d9d814c38bffaa9099c51f0a3b12a02460c966e7af084b8bfa41175016effa
|
|
| MD5 |
05d9ddc8346c4dcc25d83faae6c678de
|
|
| BLAKE2b-256 |
5ca624059a6b0c78fb6a94822217af7576c4b24ef279595880a7b8b29259f392
|
File details
Details for the file protein_design_tools-0.1.31-py3-none-any.whl.
File metadata
- Download URL: protein_design_tools-0.1.31-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7f9c9f052a8e6a713e676cdbbda19a366b0decdd181a697ab300003a2f01026
|
|
| MD5 |
49a78d3d6a0823c310589b631e353d9a
|
|
| BLAKE2b-256 |
e78279ee2fb93701dfade665cff6015c132065cf4231692bf204accbb53905ba
|