No project description provided
Project description
Chytorch RxnMap
Semisupervised Model trained on USPTO and Pistachio datasets.
Installation
Use pip install chytorch-rxnmap
to install release version.
Or pip install .
in source code directory to install DEV version.
Perform Atom-to-atom mapping
AAM integrated into chython
package and available as reaction object method. See chython
documentation here.
from chython import smiles
r = smiles('OC(=O)C(=C)C=C.C=CC#N>>OC(=O)C1=CCCC(C1)C#N')
r.reset_mapping()
print(format(r, 'm'))
>> [C:2]([C:4](=[CH2:5])[CH:6]=[CH2:7])(=[O:3])[OH:1].[CH2:8]=[CH:9][C:10]#[N:11]>>[O:3]=[C:2]([OH:1])[C:4]=1[CH2:5][CH:9]([C:10]#[N:11])[CH2:8][CH2:7][CH:6]=1
Pretrained model
To load pretrained model use:
from chytorch.zoo.rxnmap import Model
model = Model.pretrained()
To prepare data-loader use:
from chython import SMILESRead
data = []
for r in SMILESRead('data.smi'):
r.canonicalize() # fix aromaticity and functional groups
data.append(r.pack()) # store in compressed format
dl = model.prepare_dataloader(data, batch_size=20)
To get embeddings use:
for b in dl:
e = model(b)
Note: embeddings contain: cls embedding, [unusable molecular embedding, list of atoms embeddings] * n
.
Where n is the number of molecules in reaction equation.
To extract aggregated embedding, use cls embedding x = e[:, 0]
.
To extract atoms-only embeddings, use masking:
x = e[b[3] > 1]
- for all atomsx = e[b[3] == 2]
- for reactants onlyx = e[b[3] == 3]
- for products only
To get all-to-all tokens attention matrix:
for b in dl:
a = model(b, mapping_task=True)
Training new model
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.plugins import DDPPlugin
callback = ModelCheckpoint(monitor='trn_loss_tot', save_weights_only=True, save_top_k=3, save_last=True,
every_n_train_steps=10000)
trainer = Trainer(gpus=-1, precision=16, max_steps=1000000, callbacks=[callback],
strategy=DDPPlugin(find_unused_parameters=False))
model = Model(lr_warmup=1e4, lr_period=5e5, lr_max=1e-4, lr_decrease_coef=.01, masking_rate=.15, **kwargs)
# lr_warmup=1e4, lr_period=5e5, lr_max=1e-4, lr_decrease_coef=.01 - see chytorch.optim.lr_scheduler.WarmUpCosine.
# kwargs - see chytorch.nn.ReactionEncoder.
# masking_rate - probability of token masking.
trainer.fit(model, dl)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file chytorch_rxnmap-1.3-py3-none-any.whl
.
File metadata
- Download URL: chytorch_rxnmap-1.3-py3-none-any.whl
- Upload date:
- Size: 79.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a478d340d9d87f60e0859177a5044e5382a40684590e1dd291dc0c9f7744b901 |
|
MD5 | 77b710f83526dca76b249826ffa3ca3b |
|
BLAKE2b-256 | 24b46143834701c3663834d5a0059f2b4d67247d9d7ff4b38e3ec71875df600b |