Skip to main content

Graph-based pKa prediction for small molecules

Project description

pKa-predictor

Leveraging our Teaching Experience to Improve Machine Learning: Application to pKa PredictionJérôme Genzling, Ziling Luo, Benjamin Weiser, Nicolas Moitessier nicolas.moitessier@mcgill.ca 2023-12-07 – revised 2025-05-16

Graphical Abstract

🔍 What is this?

A Graph Neural Network (GNN) model for:

  • Predicting pKa values of ionizable centers
  • Identifying protonation sites
  • Estimating dominant protonation states at a given pH
  • Supporting iterative protonation/deprotonation of polyprotic molecules

🧪 Core Functionalities

  • Input: CSV with SMILES and (optionally) ionizable atom indices
  • Output: pKa value(s), and major protonated species at given pH
  • Iterative inference for molecules with multiple ionizable centers
  • Easily extendable to new datasets or re-trainable on custom data

📦 Required Libraries

Install with pip:

pip install torch torch_geometric pandas numpy rdkit seaborn hyperopt

You can also recreate our virtual environment using environment.yml

📁 Repository Structure

Datasets/ : All cleaned, split, and raw datasets

Baseline_Models/Descriptors/ : Code to generate traditional descriptors

Baseline_Models/RF, /XGB : Traditional model training scripts (Random Forest/XGB)

GNN/ : All code related to GNN/GAT models

MolGpKa_retrained/ : Code and data for retraining MolGpKa

🚀 Getting Started with the GNN

1. See available options

python main.py --mode usage

All possible arguments and their default values will be printed.

2. Predict pKa on a sample set

Your CSV will need to have at least two columns: 'Name' and 'Smiles'

On Windows:

python main.py --mode infer --input your_input.csv > infer_your_input.out

On Linux:

python main.py --mode infer --data_path ..\Datasets\ --input your_input.csv --infer_pickled ..\Datasets\pickled_data\infer_pickled.pkl --model_dir ..\Model\ > infer_your_input.out

3. Predict from a CSV in Python

You can also use the predict() function directly:

from predict import predict

predicted_pkas, protonated_smiles = predict("your_dataset.csv", pH=7.4)

4. Verbose Levels

Use the --verbose flag to control output detail:

--verbose 0: No details printed in the output (silent mode)

--verbose 1: Summary of predictions + Some cleaning details

--verbose 2: Detailed view of every deprotonation step

📖 Citation

If you use this code or model, please cite:

Genzling J, Luo Z, Weiser B, Moitessier N. Leveraging our Teacher’s Experience to Improve Machine Learning: Application to pKa Prediction. ChemRxiv. 2024; doi:10.26434/chemrxiv-2024-bpd53-v2 This content is a preprint and has not been peer-reviewed.

🧠 Tips

Use Cheminfo SMILES viewer to visualize and debug SMILES (https://www.cheminfo.org/Chemistry/Cheminformatics/Smiles/index.html)

If protonation states are off, check atom indexing or consider using neutral forms.

You can retrain on your own dataset by modifying train_pKa_predictor.py.

🛠 Support

Feel free to reach out via email or GitHub issues if you need help using or adapting the model.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pka_predictor_moitessier-0.1.5.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pka_predictor_moitessier-0.1.5-py3-none-any.whl (73.2 kB view details)

Uploaded Python 3

File details

Details for the file pka_predictor_moitessier-0.1.5.tar.gz.

File metadata

  • Download URL: pka_predictor_moitessier-0.1.5.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for pka_predictor_moitessier-0.1.5.tar.gz
Algorithm Hash digest
SHA256 be599b29634825173ffb25fb2b3e03dbfbf38d62230c121909d1e0450c476e4a
MD5 40c2ff3b44bb2e8926863b4e0f2facdc
BLAKE2b-256 bcb9b928e599e383e29d613e1fada2815e64a4eeabf0b945f6e9816dd8656b35

See more details on using hashes here.

File details

Details for the file pka_predictor_moitessier-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for pka_predictor_moitessier-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0fa9f26ad4d9d2b10ab990c95564efabe89d5189a42da050113829f881def4e5
MD5 16943507b3d744fa0842ce1a1ce90d99
BLAKE2b-256 9983971145a4eeb741c70fb7b7a9dc6f73afeb3efd9ce2dfda1ea9bbcd3502a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page