Skip to main content

PLINDER: The Protein-Ligand INteraction Dataset and Evaluation Resource

Project description

plinder

The Protein Ligand INteractions Dataset and Evaluation Resource


license publish website bioRxiv docs coverage

overview

📚 About

PLINDER, short for protein ligand interactions dataset and evaluation resource, is a comprehensive, annotated, high quality dataset and resource for training and evaluation of protein-ligand docking algorithms:

  • > 400k PLI systems across > 11k SCOP domains and > 50k unique small molecules
  • 500+ annotations for each system, including protein and ligand properties, quality, matched molecular series and more
  • Automated curation pipeline to keep up with the PDB
  • 14 PLI metrics and over 20 billion similarity scores
  • Unbound (apo) and predicted Alphafold2 structures linked to holo systems
  • train-val-test splits and ability to tune splitting based on the learning task
  • Robust evaluation harness to simplify and standard performance comparison between models.

The PLINDER project is a community effort, launched by the University of Basel, SIB Swiss Institute of Bioinformatics, VantAI, NVIDIA, MIT CSAIL, and will be regularly updated.

To accelerate community adoption, PLINDER will be used as the field’s new Protein-Ligand interaction dataset standard as part of an exciting competition at the upcoming 2024 Machine Learning in Structural Biology (MLSB) Workshop at NeurIPS, one of the field's premiere academic gatherings. More details about the competition and other helpful practical tips can be found at our recent workshop repo: Moving Beyond Memorization.

👋 Join the P(L)INDER user group Discord Server!

🔢 Plinder versions

We version the plinder dataset with two controls:

  • PLINDER_RELEASE: the month stamp of the last RCSB sync
  • PLINDER_ITERATION: value that enables iterative development within a release

We version the plinder application using an automated semantic versioning scheme based on the git commit history. The plinder.data package is responsible for generating a dataset release and the plinder.core package makes it easy to interact with the dataset.

Changelog:

  • 2024-06/v2 (Current):

    • New systems added based on the 2024-06 RCSB sync
    • Updated system definition to be more stable and depend only on ligand distance rather than PLIP
    • Added annotations for crystal contacts
    • Improved ligand handling and saving to fix some bond order issues
    • Improved covalency detection and annotation to reference each bond explicitly
    • Added linked apo/pred structures to v2/links and v2/linked_structures
    • Added binding affinity annotations from BindingDB
    • Added statistics requirement and other changes in the split to enrich test set diversity
  • 2024-04/v1: Version described in the preprint, with updated redundancy removal by protein pocket and ligand similarity.

  • 2024-04/v0: Version used to re-train DiffDock in the paper, with redundancy removal based on <pdbid>_<ligand ccd codes>

🏅 Gold standard benchmark sets

As part of PLINDER resource we provide train, validation and test splits that are curated to minimize the information leakage based on protein-ligand interaction similarity. In addition, we have prioritized the systems that has a linked experimental apo structure or matched molecular series to support realistic inference scenarios for hit discovery and optimization. Finally, a particular care is taken for test set that is further prioritized to contain high quality structures to provide unambiguous ground-truths for performance benchmarking.

test_stratification

Moreover, as we enticipate this resource to be used for benchmarking a wide range of methods, including those simultaneously predicting protein structure (aka. co-folding) or those generating novel ligand structures, we further stratified test (by novel ligand, pocket, protein or all) to cover a wide range of tasks.

👨💻 Getting Started

The PLINDER dataset is provided in two ways:

  • You can either use the files from the dataset directly using your preferred tooling by downloading the data from the public bucket,
  • or you can utilize the dedicated plinder Python package for interfacing the data.

Downloading the dataset

The dataset can be downloaded from the bucket with gsutil.

$ export PLINDER_RELEASE=2024-06 # Current release
$ export PLINDER_ITERATION=v2 # Current iteration
$ mkdir -p ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/
$ gsutil -m cp -r "gs://plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/*" ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/

For details on the sub-directories, see Documentation.

Installing the Python package

plinder is available on PyPI.

pip install plinder

📝 Documentation

A more detailed description is available on the documentation website.

📃 Citation

Durairaj, Janani, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, Xavier Robin, Gabriel Studer, Daniel Kovtun, Emanuele Rossi, Guoqing Zhou, Srimukh Prasad Veccham, Clemens Isert, Yuxing Peng, Prabindh Sundareson, Mehmet Akdel, Gabriele Corso, Hannes Stärk, Gerardo Tauriello, Zachary Wayne Carpenter, Michael M. Bronstein, Emine Kucukbenli, Torsten Schwede, Luca Naef. 2024. “PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource.” bioRxiv ICML'24 ML4LMS

Please see the citation file for details.

plinder_banner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plinder-0.2.15.tar.gz (27.6 MB view details)

Uploaded Source

Built Distribution

plinder-0.2.15-py3-none-any.whl (4.0 MB view details)

Uploaded Python 3

File details

Details for the file plinder-0.2.15.tar.gz.

File metadata

  • Download URL: plinder-0.2.15.tar.gz
  • Upload date:
  • Size: 27.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for plinder-0.2.15.tar.gz
Algorithm Hash digest
SHA256 12ec453db04d3894bfd170fd1cf47c843cfffc2b5af12f4cfe38c3219896fc63
MD5 17e6229dbe67fac2484df6aacbdf47ef
BLAKE2b-256 ca31a711f68ee96472b8cc5aee4a54c1737a68715930a1e25be6d4f498b824c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for plinder-0.2.15.tar.gz:

Publisher: main.yaml on plinder-org/plinder

Attestations:

File details

Details for the file plinder-0.2.15-py3-none-any.whl.

File metadata

  • Download URL: plinder-0.2.15-py3-none-any.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for plinder-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 99d2f50dab6f4c37e265e19317b78d9b2ebdafeb6233a9ce3fcf27a7a7053e25
MD5 751e46ac955c4f917046451652cddc88
BLAKE2b-256 3ae48c5490c1991d203072003fac619ede12c86ac03a08b4ff7eb72a85d94ef4

See more details on using hashes here.

Provenance

The following attestation bundles were made for plinder-0.2.15-py3-none-any.whl:

Publisher: main.yaml on plinder-org/plinder

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page