A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.
Project description
PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models
This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.
Installation
The libary can be installed via pip
pip install promptsmiles
Or via obtaining a copy of this repo, promptsmiles requires RDKit.
git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./
Use
PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.
Scaffold Decoration
from promptsmiles import ScaffoldDecorator, FragmentLinker
SD = ScaffoldDecorator(
scaffold="N1(*)CCN(CC1)CCCCN(*)",
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False, # CLM.sampler accepts a list of prompts or not
optimize_prompts=True,
shuffle=True, # Randomly select attachment points within a batch or not
return_all=False,
)
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
Fragment linking / scaffold hopping
FL = FragmentLinker(
fragments=["N1(*)CCNCC1", "C1CC1(*)"],
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False,
optimize_prompts=True,
shuffle=True,
scan=False, # Optional when combining 2 fragments, otherwise is set to true
return_all=False,
)
smiles = FL.sample(batch_size=3)
Required chemical language model functions
Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.
def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
"""
Input: Must have a prompt and batch_size argument.
Output: SMILES [list]
"""
# Encode prompt and sample as per model implementation
return smiles
Note: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and batch_prompts
should be set to True
in the promptsmiles class used.
The second is a function that evaluates the NLL of a list of SMILES
def CLM_evaluater(smiles: list[str]):
"""
Input: A list of SMILES
Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
"""
return nlls
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file promptsmiles-1.4.2.tar.gz
.
File metadata
- Download URL: promptsmiles-1.4.2.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79e60d10f503bdcf8b3c3e66f5518b7a708775bd6fc6ef9211f1b2d553b9f272 |
|
MD5 | 1c1ac5ad99ad79e2e2f7242930a9807b |
|
BLAKE2b-256 | 5cbd79c0b32d1725c58aae5d51bda72bed9fbd58d8438092a544d1ddd1ff6fa1 |
File details
Details for the file promptsmiles-1.4.2-py3-none-any.whl
.
File metadata
- Download URL: promptsmiles-1.4.2-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61007cea981202136ad06b7f1c41487c41dcd340ef209c700af137e13b07179c |
|
MD5 | 925d26780e3a7c3ebe738c2f42362683 |
|
BLAKE2b-256 | 62573b066aff487b7d2922bf500c120aad978915c6f351c80145ef18b93c6d32 |