Skip to main content

The official implementation of the WelQrate dataset and benchmark

Project description

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery

Installation

We provide the recommended environment, which were used for benchmarking in the original paper. Users can also build their own environment based on their own needs.

conda create -n welqrate python=3.9
pip install welqrate
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Load the Dataset

Users can download and preprocess the datasets by calling WelQrateDataset class. Available datasets include AID1798, AID435008, AID435034, AID1843, AID2258, AID463087, AID488997, AID2689, and AID485290. Please refer to our website for more details. Besides, users can choose between 2D and 3D molecular representations by setting mol_repr to 2d_graph or 3d_graph.

from welqrate.dataset import WelQrateDataset
# Load the 2D dataset
AID1798_dataset_2d = WelQrateDataset(dataset_name = 'AID1798', root =f'./datasets', mol_repr ='2d_graph')

# Load the 3D dataset 
AID1843_dataset_3d = WelQrateDataset(dataset_name = 'AID1843', root =f'./datasets', mol_repr ='3d_graph')

# Load a split dictionary
split_dict = AID1798_dataset_2d.get_idx_split(split_scheme ='random_cv1') # or 'scaffold_seed1; we provide 1-5 for both random_cv and scaffold_seed

Train a model

We can store hyperparameters related to model, training scheme, and dataset in a configuration file. Users can refer to configuration files in ./config/ for different models. Then, we can config the model and start training by calling train function.

dataset_name = 'AID1798'
split_scheme = 'random_cv1'
AID1798_2d = WelQrateDataset(dataset_name=dataset_name, root='./datasets', mol_repr='2d_graph',
                             source='inchi')
split_dict = AID1798_2d.get_idx_split(split_scheme)

train_loader = get_train_loader(AID1798_2d[split_dict['train']], batch_size=128, num_workers=0, seed=1)
valid_loader = get_valid_loader(AID1798_2d[split_dict['valid']], batch_size=128, num_workers=0)
test_loader = get_test_loader(AID1798_2d[split_dict['test']], batch_size=128, num_workers=0)


config = {}
# default train config
for config_file in ['./config/train.yaml', './config/gcn.yaml']:
    with open(config_file) as file:
        config.update(yaml.safe_load(file))

# initialize model
hidden_channels = config['model']['hidden_channels']
num_layers = config['model']['num_layers']
model = GCN_Model(hidden_channels = hidden_channels, 
                  num_layers = num_layers)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

train(model = model,
      train_loader = train_loader,
      valid_loader = valid_loader,
      test_loader = test_loader,
      config = config,
      device = device,
      save_path = f'./results/{dataset_name}/{split_scheme}/gcn'
      )

Citation

If you find our work helpful, please cite our paper:

@article{dong2024welqrate,
  title={WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking},
  author={Yunchao, Liu and Dong, Ha and Wang, Xin and Moretti, Rocco and Wang, Yu and Su, Zhaoqian and Gu, Jiawei and Bodenheimer, Bobby and Weaver, Charles David and Meiler, Jens and Derr, Tyler and others},
  journal={arXiv preprint arXiv:2411.09820},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

welqrate-0.1.4.tar.gz (41.7 kB view details)

Uploaded Source

Built Distribution

welqrate-0.1.4-py3-none-any.whl (49.2 kB view details)

Uploaded Python 3

File details

Details for the file welqrate-0.1.4.tar.gz.

File metadata

  • Download URL: welqrate-0.1.4.tar.gz
  • Upload date:
  • Size: 41.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for welqrate-0.1.4.tar.gz
Algorithm Hash digest
SHA256 668cd5c57aa2f99f8d241e1f407238f6952aa4bd5965bbfc356a7890eaf33c5a
MD5 4df1c7a4a05495d2b47f0d94aaf39eb8
BLAKE2b-256 265674539e2576079b1127b67b15d698256256ffd7be32daf07ef68733c92f75

See more details on using hashes here.

File details

Details for the file welqrate-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: welqrate-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 49.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for welqrate-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eb021e0452483c598bc37a5ebf8459d0de28f69ad3dfbfc0c29ca0fb70a51d0c
MD5 dff6b99452abe8d9c4a4302afd825947
BLAKE2b-256 943960542034c9bf9f1ab3f9e921776056697f02e9284a85b09cf3d9dc2f274b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page