Skip to main content

A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.

Project description

PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.

Installation

The libary can be installed via pip

pip install promptsmiles

Or via obtaining a copy of this repo, promptsmiles requires RDKit.

git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./

Use

PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.

Scaffold Decoration

from promptsmiles import ScaffoldDecorator, FragmentLinker

SD = ScaffoldDecorator(
    scaffold="N1(*)CCN(CC1)CCCCN(*)",
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False, # CLM.sampler accepts a list of prompts or not
    optimize_prompts=True,
    shuffle=True, # Randomly select attachment points within a batch or not
    return_all=False,
    )
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired

alt text

Fragment linking / scaffold hopping

FL = FragmentLinker(
    fragments=["N1(*)CCNCC1", "C1CC1(*)"],
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False,
    optimize_prompts=True,
    shuffle=True, 
    scan=False, # Optional when combining 2 fragments, otherwise is set to true
    return_all=False,
)
smiles = FL.sample(batch_size=3)

alt text

Required chemical language model functions

Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.

def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
    """
    Input: Must have a prompt and batch_size argument.
    Output: SMILES [list]
    """
    # Encode prompt and sample as per model implementation
    return smiles

Note: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and batch_prompts should be set to True in the promptsmiles class used.

The second is a function that evaluates the NLL of a list of SMILES

def CLM_evaluater(smiles: list[str]):
    """
    Input: A list of SMILES
    Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
    """
    return nlls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptsmiles-1.4.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptsmiles-1.4-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file promptsmiles-1.4.tar.gz.

File metadata

  • Download URL: promptsmiles-1.4.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for promptsmiles-1.4.tar.gz
Algorithm Hash digest
SHA256 85ce794ded15bc0f2bf8b11c2d6a23c34e9bff402f53a19b30e060bb5c9fabf0
MD5 d2fa69af59af9be25ddf3a32fb99531a
BLAKE2b-256 a56c3af63f3b89f151cadeb787dd2c1d6a37eebd93a3f805f3090b8c2db99810

See more details on using hashes here.

File details

Details for the file promptsmiles-1.4-py3-none-any.whl.

File metadata

  • Download URL: promptsmiles-1.4-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for promptsmiles-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 528a2add061fac15ea5dd23d99ff5eed58f66a1dcccd2ea675e516e5fa65acf7
MD5 f4215504ffcee006e49855510734a3f3
BLAKE2b-256 2a4c6d63b2c174cd7f173111975df2fe59927e26282f5e24cecb3eefb9b94435

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page