This is the code for "SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes" paper. The preprint of this paper can be found in ChemRxiv with https://doi.org/10.26434/chemrxiv-2022-0hl5p
Project description
SolvBERT
Description
This is the code for "SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes" paper. The preprint of this paper can be found in ChemRxiv with https://doi.org/10.26434/chemrxiv-2022-0hl5p
Installation
- python 3.7.12
- transformers 2.11.0
- wandb 0.12.15
- tokenizers 0.7.0
- rxnfp 0.0.7
use pip install solv-bert==0.0.5
install this package
Dataset
CombiSolv-QM. The Solvation-QM dataset originally came from a study by Vermeire et al. who computed the dataset using a commercial software called COSMOtherm. The dataset consists of 1 million data points randomly selected from all possible combinations of 284 commonly used solvents and 11,02914 solutes. CombiSolv-Exp-8780. experimental solvent free energy data for 8,780 different solute and solvents combinations from Vermeire et al. Solubility. The Solubility dataset was originally from Boobier et al. It was curated from the Open Notebook Science Challenges water solubility dataset and the Reaxys database. This dataset includes ethanol with 695 solutes, benzene with 464 solutes, acetone with 452 solutes, and water with 900 solutes, for a total of 2,511 different combinations, with solubility expressed as log S.
Train
Model use the SMILESlanguagemodel.py to train and the Finetune.py to fine-tune. Each pre-training (SMILESlanguagemodel.py) needs to save the best model for the next step of fine-tune(Finetune.py).
Test
Model use the Predictandeval.py to shart testing. In this step, we also need to save the best model trained in the previous step (Finetune.py) for prediction and evaluation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.