Skip to main content

ChemSpaceAL Python package: an efficient active learning methodology applied to protein-specific molecular generation

Project description

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein- Specific Molecular Generation

Checked with mypy Code style: black codecov License: MIT image Open In Colab

A description of the active learning methodology

Abstract

The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.

Preprint

Associated preprint can be found on arXiv. Note, a second version of the preprint has been posted on Dec 4, 2023.

Installation

in order to install the ChemSpaceAL package, simply run:

pip install ChemSpaceAL

You could also open ChemSpaceAL.ipynb in Google Colab to see an example of how to use a package.

Contact

Please feel free to reach out to us through either of the following emails if you have any questions or need any additional files:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ChemSpaceAL-2.0.1.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

ChemSpaceAL-2.0.1-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file ChemSpaceAL-2.0.1.tar.gz.

File metadata

  • Download URL: ChemSpaceAL-2.0.1.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ChemSpaceAL-2.0.1.tar.gz
Algorithm Hash digest
SHA256 c431afd86c1cef3c37d4d31e8a047ecdad955542d56cc74b18b895e3dca0948a
MD5 f8b70ff205fed41c784130edce7c80f2
BLAKE2b-256 b2730b447f1bd04a93b2ae78355055b364f9a77376b04373120199a3897a3861

See more details on using hashes here.

File details

Details for the file ChemSpaceAL-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: ChemSpaceAL-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ChemSpaceAL-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 088e741bf70035203519fe5a6d1ea4ab149c37bee39c5bde4fcae97ca167ca77
MD5 c2722e7c6db62a1aa5b4114d373ff5d5
BLAKE2b-256 4d383bcf337b038ac45b9d16736f6e2b562470ee3f97d5bf7773ca217c30f32d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page