Sandbox for Computational Protein Design
Project description
_____________________.___.____ .____
\__ ___/\______ \ | | | |
| | | _/ | | | |
| | | | \ | |___| |___
|____| |____|_ /___|_______ \_______ \
\/ \/ \/
Intro
TRILL (TRaining and Inference using the Language of Life) is a sandbox for creative protein engineering and discovery. As a bioengineer myself, deep-learning based approaches for protein design and analysis are of great interest to me. However, many of these deep-learning models are rather unwieldy, especially for non ML-practitioners due to their sheer size. Not only does TRILL allow researchers to perform inference on their proteins of interest using a variety of models, but it also democratizes the efficient fine-tuning of large-language models. Whether using Google Colab with one GPU or a supercomputer with many, TRILL empowers scientists to leverage models with millions to billions of parameters without worrying (too much) about hardware constraints. Currently, TRILL supports using these models as of v1.8.0:
Breakdown of TRILL's Commands
Command | Function | Available Models |
---|---|---|
Embed | Generates numerical representations or "embeddings" of protein sequences for quantitative analysis and comparison. | ESM2, ProtT5-XL, ProstT5, Ankh |
Visualize | Creates interactive 2D visualizations of embeddings for exploratory data analysis. | PCA, t-SNE, UMAP |
Finetune | Finetunes protein language models for specific tasks. | ESM2, ProtGPT2, ZymCTRL |
Language Model Protein Generation | Generates proteins using pretrained language models. | ESM2, ProtGPT2, ZymCTRL |
Inverse Folding Protein Generation | Designs proteins to fold into specific 3D structures. | ESM-IF1, LigandMPNN, ProstT5 |
Diffusion Based Protein Generation | Uses denoising diffusion models to generate proteins. | RFDiffusion |
Fold | Predicts 3D protein structures. | ESMFold, ProstT5 |
Dock | Simulates protein-ligand interactions. | DiffDock, Smina, Autodock Vina, Lightdock, GeoDock |
Classify | Predicts protein properties with pretrained models or train custom classifiers | TemStaPro, EpHod, ECPICK, LightGBM, XGBoost, Isolation Forest |
Regress | Train custom regression models. | LightGBM, Linear |
Simulate | Uses molecular dynamics to simulate protein-ligand interactions. | OpenMM |
Score | Utilize ESM1v or ESM2 to score protein sequences or ProteinMPNN to score protein structures in a zero-shot manner. | COMPSS |
Documentation
Check out the documentation and examples at https://trill.readthedocs.io/en/latest/index.html
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file trill_proteins-1.8.0.tar.gz
.
File metadata
- Download URL: trill_proteins-1.8.0.tar.gz
- Upload date:
- Size: 11.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d60089ace59b9e87d73f471ecd37708ad716c010b5ede8d172977581629f7f6d |
|
MD5 | d09eddacda1a2b98ff0764c4e39134d4 |
|
BLAKE2b-256 | 157d16a01e729ce0612f56fdb139d5d66a01e842287120a62b30ca0efb2ac3b0 |
File details
Details for the file trill_proteins-1.8.0-py3-none-any.whl
.
File metadata
- Download URL: trill_proteins-1.8.0-py3-none-any.whl
- Upload date:
- Size: 11.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5de0d6e81a64fc6205039c27087fb817478cb4e15781ed5888dcea99a28d1fb |
|
MD5 | f670c67a49bd09a88880f24e6f15f830 |
|
BLAKE2b-256 | 604d0e9f62db6f324099e87630f2a3413efd3c974a2bd1510f09be395f01f2eb |