Skip to main content

Deep learning ready datasets of 3D protein structures.

Project description

build pypi docs downloads codecov

      Quickstart      Website      Documentation      Paper      Contribute      Leaderboard      Tutorials

ProteinShake provides one-liner imports of large scale, preprocessed protein structure tasks and datasets for various model types and frameworks.

We provide a collection of preprocessed and cleaned protein 3D structure datasets from RCSB and AlphaFoldDB, including annotations. Structures are easily converted to graphs, voxels, or point clouds and loaded natively into PyTorch, Tensorflow, Numpy, JAX, PyTorch-Geometric, DGL and NetworkX. The task API enables standardized benchmarking on a variety of tasks on protein and residue level. We also provide an API for evaluating models on several biologically relevant prediction tasks.

Find more information on the Website and the Documentation, or check out the Tutorials.

Installation:

- This is a pre-release version. There may be unannounced changes to the API and datasets. -
- We expect some bugs as well, please open an issue if you find one. -
pip install proteinshake

Data workflow:

In one line you can import large datasets of protein 3D structures, encode them as graphs/voxel grids/point clouds, and port them to your favorite learning framework.

Task workflow:

The task API lets you easily access the underlying data for several tasks, get random/sequence/structure based splits, and evaluate your predictions.


Legal Note

Code in this repository is licensed under BSD-3, the dataset files on Zenodo are licensed under CC-BY-4.0.

We obtained and modified data from the following sources:

The AlphaFold protein structures were downloaded from the AlphaFold Structure Database, available under CC-BY-4.0.

The RCSB protein structures were downloaded from RCSB, available under CC0 1.0.

Protein and Ligand binding structures and annotations were downloaded from PDBbind-CN, available under the End User Agreement for Access to the PDBbind-CN Database and Web Site.

The Gene Ontology was downloaded from the Gene Ontology Consortium, available under CC-BY-4.0.

The SCOP data was downloaded from the Structural Classification of Proteins, available under CC-BY-4.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteinshake-0.3.7.tar.gz (42.5 kB view hashes)

Uploaded Source

Built Distribution

proteinshake-0.3.7-py3-none-any.whl (60.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page