Skip to main content

A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.

Project description

PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.

Installation

The libary can be installed via pip

pip install promptsmiles

Or via obtaining a copy of this repo, promptsmiles requires RDKit.

git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./

Use

PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.

Scaffold Decoration

from promptsmiles import ScaffoldDecorator, FragmentLinker

SD = ScaffoldDecorator(
    scaffold="N1(*)CCN(CC1)CCCCN(*)",
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False, # CLM.sampler accepts a list of prompts or not
    optimize_prompts=True,
    shuffle=True, # Randomly select attachment points within a batch or not
    return_all=False,
    )
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired

alt text

Fragment linking / scaffold hopping

FL = FragmentLinker(
    fragments=["N1(*)CCNCC1", "C1CC1(*)"],
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False,
    optimize_prompts=True,
    shuffle=True,
    scan=False, # Optional when combining 2 fragments, otherwise is set to true
    return_all=False,
)
smiles = FL.sample(batch_size=3)

alt text

Required chemical language model functions

Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.

def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
    """
    Input: Must have a prompt and batch_size argument.
    Output: SMILES [list]
    """
    # Encode prompt and sample as per model implementation
    return smiles

Note: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and batch_prompts should be set to True in the promptsmiles class used.

The second is a function that evaluates the NLL of a list of SMILES

def CLM_evaluater(smiles: list[str]):
    """
    Input: A list of SMILES
    Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
    """
    return nlls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptsmiles-1.4.2.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

promptsmiles-1.4.2-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file promptsmiles-1.4.2.tar.gz.

File metadata

  • Download URL: promptsmiles-1.4.2.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for promptsmiles-1.4.2.tar.gz
Algorithm Hash digest
SHA256 79e60d10f503bdcf8b3c3e66f5518b7a708775bd6fc6ef9211f1b2d553b9f272
MD5 1c1ac5ad99ad79e2e2f7242930a9807b
BLAKE2b-256 5cbd79c0b32d1725c58aae5d51bda72bed9fbd58d8438092a544d1ddd1ff6fa1

See more details on using hashes here.

Provenance

File details

Details for the file promptsmiles-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: promptsmiles-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for promptsmiles-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 61007cea981202136ad06b7f1c41487c41dcd340ef209c700af137e13b07179c
MD5 925d26780e3a7c3ebe738c2f42362683
BLAKE2b-256 62573b066aff487b7d2922bf500c120aad978915c6f351c80145ef18b93c6d32

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page