ChemTSv2 is a flexible and versatile molecule generator based on reinforcement learning with natural language processing.

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.11

Project description

ChemTSv2

ChemTSv2[^13] is a refined and extended version of ChemTS[^1] and MPChemTS[^2]. The original implementations are available at https://github.com/tsudalab/ChemTS and https://github.com/yoshizoe/mp-chemts, respectively.

ChemTSv2 provides:

easy-to-run interface by using only a configuration file
easy-to-define framework for users' any reward function, molecular filter, and tree policy
various usage examples in the GitHub repository

[^13]: Ishida, S. and Aasawat, T. and Sumita, M. and Katouda, M. and Yoshizawa, T. and Yoshizoe, K. and Tsuda, K. and Terayama, K. (2023). ChemTSv2: Functional molecular design using de novo molecule generator. WIREs Computational Molecular Science https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.1680

[^1]: Yang, X., Zhang, J., Yoshizoe, K., Terayama, K., & Tsuda, K. (2017). ChemTS: an efficient python library for de novo molecular generation. Science and Technology of Advanced Materials, 18(1), 972–976. https://doi.org/10.1080/14686996.2017.1401424

[^2]: Yang, X., Aasawat, T., & Yoshizoe, K. (2021). Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design. In International Conference on Learning Representations. https://openreview.net/forum?id=6k7VdojAIK

How to setup :pushpin:

Requirements :memo:

Click to show/hide requirements

python: 3.11
rdkit: 2023.9.1
tensorflow: 2.14.1
pyyaml
pandas: 2.1.3
joblib
mpi4py: 3.1.5 (for massive parallel mode)

ChemTSv2 with single process mode :red_car:

Click to show/hide the instruction

cd YOUR_WORKSPACE
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade chemtsv2

ChemTSv2 with massive parallel mode :airplane:

Click to show/hide the instruction

NOTE: You need to run ChemTSv2-MP on a server where OpenMPI or MPICH is installed. If you can't find `mpiexec` command, please consult your server administrator to install such an MPI library.

If you can use/prepare a server with MPI environment, please follow the (a) instruction; otherwise, please follow the (b) instruction.

(a) Installation on a server WITH a MPI environment

cd YOUR_WORKSPACE
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade chemtsv2
pip install mpi4py==3.1.5

(b) Installation on a server WITHOUT a MPI environment

conda create -n mpchem python=3.11 -c conda-forge
# swith to the `mpchem` environment
conda install -c conda-forge openmpi cxx-compiler mpi mpi4py=3.1.5
pip install --upgrade chemtsv2

How to run ChemTSv2 :pushpin:

1. Clone this repository and move into it

git clone git@github.com:molecule-generator-collection/ChemTSv2.git
cd ChemTSv2

2. Prepare a reward file

Please refer to reward/README.md. An example of reward definition for LogP maximization task is as follows.

from rdkit.Chem import Descriptors
import numpy as np
from reward.reward import Reward

class LogP_reward(Reward):
    def get_objective_functions(conf):
        def LogP(mol):
            return Descriptors.MolLogP(mol)
        return [LogP]
    
    def calc_reward_from_objective_values(objective_values, conf):
        logp = objective_values[0]
        return np.tanh(logp/10)

3. Prepare a config file

The explanation of options are described in the Support option/function section. The prepared reward file needs to be specified in reward_setting. For details, please refer to a sample file (config/setting.yaml). If you want to pass any value to calc_reward_from_objective_values (e.g., weights for each value), add it in the config file.

4. Generate molecules

ChemTSv2 with single process mode :red_car:

chemtsv2 -c config/setting.yaml

ChemTSv2 with massive parallel mode :airplane:

mpiexec -n 4 chemtsv2-mp --config config/setting_mp.yaml

ChemTSv2 with Docker

docker build -t chemtsv2:1.0.0 .
docker run -u $(id -u):$(id -g) \
           --rm \
           --mount type=bind,source=./,target=/app/ \
           chemtsv2:1.0.0 \
           chemtsv2 -c config/setting.yaml

Example usage :pushpin:

Target	Reward	Config	Additional requirement	Ref.
LogP	logP_reward.py	setting.yaml	-	-
Jscore	Jscore_reward.py	setting_jscore.yaml	-	[^1]
Absorption wavelength	chro_reward.py	setting_chro.yaml	Gaussian 16[^3] via QCforever[^10]	[^4]
Absorption wavelength	chro_gamess_reward.py	setting_chro_gamess.yaml	GAMESS 2022.2[^12] via QCforever[^10]
Upper-absorption & fluorescence wavelength	fluor_reward.py	setting_fluor.yaml	Gaussian 16[^3] via QCforever[^10]	[^5]
Kinase inhibitory activities	dscore_reward.py	setting_dscore.yaml	LightGBM[^6]	[^7]
Docking score	Vina_binary_reward.py	setting_vina_binary.yaml	AutoDock Vina[^8]	[^9]
Pharmacophore	pharmacophore_reward.py	setting_pharmacophore.yaml	-	[^11]

[^3]: Frisch, M. J. et al. Gaussian 16 Revision C.01. 2016; Gaussian Inc. Wallingford CT. [^4]: Sumita, M., Yang, X., Ishihara, S., Tamura, R., & Tsuda, K. (2018). Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Central Science, 4(9), 1126–1133. https://doi.org/10.1021/acscentsci.8b00213 [^5]: Sumita, M., Terayama, K., Suzuki, N., Ishihara, S., Tamura, R., Chahal, M. K., Payne, D. T., Yoshizoe, K., & Tsuda, K. (2022). De novo creation of a naked eye–detectable fluorescent molecule based on quantum chemical computation and machine learning. Science Advances, 8(10). https://doi.org/10.1126/sciadv.abj3906 [^6]: Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., … Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154. [^7]: Yoshizawa, T., Ishida, S., Sato, T., Ohta, M., Honma, T., & Terayama, K. (2022). Selective Inhibitor Design for Kinase Homologs Using Multiobjective Monte Carlo Tree Search. Journal of Chemical Information and Modeling, 62(22), 5351–5360. https://doi.org/10.1021/acs.jcim.2c00787 [^8]: Eberhardt, J., Santos-Martins, D., Tillack, A. F., & Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling, 61(8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203 [^9]: Ma, B., Terayama, K., Matsumoto, S., Isaka, Y., Sasakura, Y., Iwata, H., Araki, M., & Okuno, Y. (2021). Structure-Based de Novo Molecular Generator Combined with Artificial Intelligence and Docking Simulations. Journal of Chemical Information and Modeling, 61(7), 3304–3313. https://doi.org/10.1021/acs.jcim.1c00679 [^10]: Sumita, M., Terayama, K., Tamura, R., & Tsuda, K. (2022). QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization. Journal of Chemical Information and Modeling, 62(18), 4427–4434. https://doi.org/10.1021/acs.jcim.2c00812 [^11]: 石田祥一, 吉澤竜哉, 寺山慧 (2023). 深層学習と木探索に基づくde novo分子設計, SAR News, 44. [^12]: Barca, Giuseppe M. J. et al. (2020). Recent developments in the general atomic and molecular electronic structure system. The Journal of Chemical Physics, 152(15), 154102. https://doi.org/10.1063/5.0005188

Support option/function :pushpin:

Option	Single process	Massive parallel	Description
`c_val`	:white_check_mark:	:white_check_mark:	Exploration parameter to balance the trade-off between exploration and exploitation. A larger value (e.g., 1.0) prioritizes exploration, and a smaller value (e.g., 0.1) prioritizes exploitation.
`threshold_type`	:white_check_mark:	:heavy_check_mark:	Threshold type to select how long (`hours`) or how many (`generation_num`) molecule generation to perform. Massive parallel mode currently supports only the how long (`hours`) option.
`hours`	:white_check_mark:	:white_check_mark:	Time for molecule generation in hours
`generation_num`	:white_check_mark:	:white_large_square:	Number of molecules to be generated. Please note that the specified number is usually exceeded.
`expansion_threshold`	:white_check_mark:	:white_large_square:	(Advanced) Expansion threshold of the cumulative probability. The default is set to 0.995.
`simulation_num`	:white_check_mark:	:white_large_square:	(Advanced) Number of rollout runs in one cycle of MCTS. The default is set to 3.
`flush_threshold`	:white_check_mark:	:white_large_square:	Threshold for saving the progress of a molecule generation. If the number of generated molecules exceeds the threshold value, the result is saved. The default is set to -1, and this represents no progress is to be saved.
Molecule filter	:white_check_mark:	:white_check_mark:	Molecule filter to skip reward calculation of unfavorable generated molecules. Please refer to filter/README.md for details.
RNN model replacement	:white_check_mark:	:white_check_mark:	Users can switch RNN models used in expansion and rollout steps of ChemTSv2. The model needs to be trained using Tensorflow. `model_json` specifies the JSON file that contains the architecture of the RNN model, and `model_weight` specifies the file in H5 format that contains a set of the values of the weights.
Reward replacement	:white_check_mark:	:white_check_mark:	Users can use any reward function as long as they follow the reward base class (reward/reward.py). Please refer to reward/README.md for details.
Policy replacement	:white_check_mark:	:white_large_square:	(Advanced) Users can use any policy function as long as they follow the policy base class (policy/policy.py). Please refer to policy/README.md for details.
Restart	:beginner:	:beginner:	Users can save the checkpoint file and restart from the file. If users want to save a checkpoint file, (SP mode) set `save_checkpoint` to True and specify the file name in `checkpoint_file`. If users want to restart from the checkpoint, set `restart` to True and specify the checkpoint file in `checkpoint_file`. (MP mode) under development.

:white_check_mark: indicates that the option/function is supported.
:heavy_check_mark: indicates that the option/function is partially supported.
:beginner: indicates that the option/function is beta version.
:white_large_square: indicates that the option/function is NOT supported.

Filter functions are described in filter/README.md.

Advanced usage :pushpin:

Extend user-specified SMILES

You can extend the SMILES string you input. In this case, you need to put the atom you want to extend at the end of the string and run ChemTS with --input_smiles argument as follows.

chemtsv2 -c config/setting.yaml --input_smiles 'C1=C(C)N=CC(N)=C1C'

Specify the last atom of SMILES string using OpenBabel

OpenBabel can be used to rearrange a SMILES string so that the specified atom comes last. For example, if you want to rearrange Br in NC1=CC(Br)=CC=C1 to the last position, run the following command:

# obabel -:"<SMILES>" -osmi -xl <atom no.>
# Atom numbers correspond to the order of atoms in an input SMILES string.
# In this example, `Br` appears fifth, so we specify `5` as a <atom no.>.
obabel -:"NC1=CC(Br)=CC=C1" -osmi -xl 5
# output: Nc1cc(ccc1)Br

Please refer to the official documentation for detailed usage.

Train RNN models using your own dataset

If you want to use the RNN models trained on your own datasets, use train_model/train_RNN.py and train_model/model_setting.yaml to train the models. You need to prepare a dataset that only contains SMILES string and modify the path in dataset key in model_setting.yaml. And then, run the following command:

cd train_model/
python train_RNN.py -c model_setting.yaml

Please note that the current version of ChemTSv2 does not support the change for RNN model structures, and users can only change the parameters described in model_setting.yaml.

Once you train the RNN model, specify the path to the checkpoint and token files in model_setting and token keys in ChemTSv2 config files to run ChemTSv2 with your own RNN model.

GPU acceleration

If you want to use GPU, run ChemTS with --gpu GPU_ID argument as follows.

chemtsv2 -c config/setting.yaml --gpu 0

How to cite

@article{Ishida2023,
  doi = {10.1002/wcms.1680},
  url = {https://doi.org/10.1002/wcms.1680},
  year = {2023},
  month = jul,
  publisher = {Wiley},
  author = {Shoichi Ishida and Tanuj Aasawat and Masato Sumita and Michio Katouda and Tatsuya Yoshizawa and Kazuki Yoshizoe and Koji Tsuda and Kei Terayama},
  title = {ChemTSv2: Functional molecular design using de novo molecule generator},
  journal = {{WIREs} Computational Molecular Science}
}

License :pushpin:

This package is distributed under the MIT License.

Contact :pushpin:

Shoichi Ishida (ishida.sho.nm@yokohama-cu.ac.jp)
Kei Terayama (terayama@yokohama-cu.ac.jp).

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.11

Release history Release notifications | RSS feed

1.0.8

Nov 18, 2024

1.0.7

Jul 9, 2024

1.0.6

Jul 9, 2024

1.0.5

Jun 27, 2024

1.0.4

Jun 27, 2024

1.0.3

May 27, 2024

1.0.2

Dec 15, 2023

1.0.1

Dec 7, 2023

This version

1.0.1rc0 pre-release

Dec 7, 2023

1.0.0

Nov 17, 2023

1.0.0rc2 pre-release

Nov 17, 2023

1.0.0rc1 pre-release

Nov 17, 2023

0.9.13

Nov 9, 2023

0.9.12

Nov 9, 2023

0.9.11

Jul 14, 2023

0.9.10

Feb 9, 2023

0.9.9

Feb 6, 2023

0.9.8

Feb 3, 2023

0.9.7

Feb 3, 2023

0.9.6

Jan 27, 2023

0.9.5

Dec 28, 2022

0.9.4

Dec 19, 2022

0.9.3

Dec 6, 2022

0.9.2

Oct 11, 2022

0.9.1

Oct 7, 2022

0.9.0

Oct 6, 2022

0.8.1

Aug 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemtsv2-1.0.1rc0.tar.gz (31.6 kB view details)

Uploaded Dec 7, 2023 Source

Built Distribution

chemtsv2-1.0.1rc0-py3-none-any.whl (32.7 kB view details)

Uploaded Dec 7, 2023 Python 3

File details

Details for the file chemtsv2-1.0.1rc0.tar.gz.

File metadata

Download URL: chemtsv2-1.0.1rc0.tar.gz
Upload date: Dec 7, 2023
Size: 31.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.11.6 Linux/5.15.0-88-generic

File hashes

Hashes for chemtsv2-1.0.1rc0.tar.gz
Algorithm	Hash digest
SHA256	`eabae6ecd07d6db539986cec635104ea52034024a13310c33f0deb380a91d932`
MD5	`045efe199ef9c328ee1e5222c626219f`
BLAKE2b-256	`7d671e47059591fd6a758f2f5cad3b31ab33a898b491d4b3e419270ce0a5b08d`

See more details on using hashes here.

File details

Details for the file chemtsv2-1.0.1rc0-py3-none-any.whl.

File metadata

Download URL: chemtsv2-1.0.1rc0-py3-none-any.whl
Upload date: Dec 7, 2023
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.11.6 Linux/5.15.0-88-generic

File hashes

Hashes for chemtsv2-1.0.1rc0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a31c077edafddb292bc0dd000a456928711fc4b6fc57d669fc2f5d86a19b7475`
MD5	`c186f68b8682e8407e7982174b6577ec`
BLAKE2b-256	`4b726f095cf174b2a19a2d7bf0f4e43b6dba14b1be06154ba0ae13eb9ee83111`

See more details on using hashes here.

chemtsv2 1.0.1rc0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ChemTSv2

How to setup :pushpin:

Requirements :memo:

ChemTSv2 with single process mode :red_car:

ChemTSv2 with massive parallel mode :airplane:

(a) Installation on a server WITH a MPI environment

(b) Installation on a server WITHOUT a MPI environment

How to run ChemTSv2 :pushpin:

1. Clone this repository and move into it

2. Prepare a reward file

3. Prepare a config file

4. Generate molecules

ChemTSv2 with single process mode :red_car:

ChemTSv2 with massive parallel mode :airplane:

ChemTSv2 with Docker

Example usage :pushpin:

Support option/function :pushpin:

Advanced usage :pushpin:

Extend user-specified SMILES

Specify the last atom of SMILES string using OpenBabel

Train RNN models using your own dataset

GPU acceleration

How to cite

License :pushpin:

Contact :pushpin:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes