Skip to main content

Infer the bioactivity of molecules using models trained on molecular 3D structures.

Project description

signaturizer3D

signaturizer3D


Infer the bioactivity of molecules using models trained on molecular 3D structures


Bioactivity signaturizers

This package builds on the original signaturizers package (paper) by applying a state of the art 3D transformer model for molecular representation (Uni-Mol) to the task of infering bioactivity. One of the novel capabilities of signaturizer3D is the ability to infer different bioactivities for stereoisomers of a molecule.

Bioactivity signatures are multi-dimensional vectors that capture biological traits of a molecule (fex its target profile) in a numerical vector format that is akin to the structural descriptors or fingerprints used in the field of chemoinformatics.

The signaturizer3D models infer bioactivity in 25 bioactivity types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. See the preprint for more information.

For an overview of the different bioctivity types available see the original Chemical Checker paper or website.

Get started

Install

Create a virtual environment with Python 3.9 or higher.

conda create -n sign3D-env python=3.10
conda activate sign3D-env

Install the package with pip

python -m pip install signaturizer3d

Install pytorch. Pytorch needs to be installed separately. Find the correct install command for your compute platform (CPU, GPU, ...) and install tool (pip, conda) on this page. Fex, if you want to install with Conda for the CPU only you would run.

conda install pytorch torchvision torchaudio cpuonly -c pytorch

If you're using singularity containers there is an example image definition file singularity.def that shows how you can install the package with cuda and pytorch.

Infer signatures for molecules

Instantiate a signaturizer for one of the 25 bioactivity spaces in the chemical checker:

from signaturizer3d import Signaturizer, CCSpace

CCSpace.print_spaces() # Prints a description of the 25 available spaces

signaturizer = Signaturizer(CCSpace.B4)

The first time you load a space it will download and cache the model weights locally.

Infer signaturers from a list of SMILES.

smiles_list = ['C', 'CCC', "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1" ]
signatures = signaturizer.infer_from_smiles(smiles_list)
print(signatures.shape) # -> (3, 128) a 128D vector per molecule

Infer signatures from an SDF file or a directory of SDF files by specifying a path.

signatures = signaturizer.infer_from_sdf("/path/to/file.sdf")

(Download the nicotine sdf file if you want an sdf file to test with)

See this notebook for more detailed examples of signaturizer usage.

For a more comprehensive example of using infered bioactivity signatures for analysing similarity between a set of compounds have a look at the example notebook in the original signaturizers package.

Development

Guidelines on how to set up the development environment and run tests.

Install dependencies locally with Poetry

Dependencies are managed via poetry. Poetry is a tool that does depencency management and packaging in Python, it allows you to declare the libraries your project depends on and it will manage (install/update) them for you. Poetry is also a handy tool for making virtual environment with the project dependencies. First, follow the documentation on how to install poetry here. Then using poetry, install the system dependencies with poetry by running this inside the project directory:

poetry install

This will create a virtual environment with the project dependencies. The virtual environment can be activated in your current shell by running poetry shell, or you can run any command inside the virtual environment by prefixing it with poetry run fex you'd run the tests inside the virtual environment with poetry run pytest.

Pytorch is the only dependency not managed by poetry. This means it needs to be installed manually in the virtual environment. Find the correct install command for your compute platform (CPU, GPU, ...) on this page. Run the install command inside the poetry environment by prefixing it with poetry run like this:

# Replace the install command if you're installing for a GPU
poetry run pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

The project has been tested with Pytorch 2.1

Run tests

Run all tests.

poetry run pytest

By default tests marked with performance are excluded. These tests test runtime of different components. This will be very system dependent, therefore they are excluded by default. If you want to use these tests to monitor changes in performance as you change the code you should run the tests on your system before making any changes and update the time threshold to that before making any changes. Run the performance tests with:

poetry run pytest -m 'performance'

Documentation

For more information about the package and unimol check out the docs

Signaturizer3D use the Uni-Mol architecture and pre-trained weights for fine tuning. See the Uni-Mol article and repo for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signaturizer3d-0.1.8.tar.gz (52.6 kB view hashes)

Uploaded Source

Built Distribution

signaturizer3d-0.1.8-py3-none-any.whl (56.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page