Infer the bioactivity of molecules using models trained on molecular 3D structures.
Project description
signaturizer3D
Infer the bioactivity of molecules using models trained on molecular 3D structures
Bioactivity signaturizers
This package builds on the original signaturizers package (paper) by applying a state of the art 3D transformer model for molecular representation (Uni-Mol) to the task of infering bioactivity. One of the novel capabilities of signaturizer3D
is the
ability to infer different bioactivities for stereoisomers of a molecule.
Bioactivity signatures are multi-dimensional vectors that capture biological traits of a molecule (fex its target profile) in a numerical vector format that is akin to the structural descriptors or fingerprints used in the field of chemoinformatics.
The signaturizer3D models infer bioactivity in 25 bioactivity types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. See the preprint for more information.
For an overview of the different bioctivity types available see the original Chemical Checker paper or website.
Get started
Install
Create a virtual environment with Python 3.9 or higher.
conda create -n sign3D-env python=3.10
conda activate sign3D-env
Install the package with pip
python -m pip install signaturizer3d
Install pytorch. Pytorch needs to be installed separately. Find the correct install command for your compute platform (CPU, GPU, ...) and install tool (pip, conda) on this page. Fex, if you want to install with Conda for the CPU only you would run.
conda install pytorch torchvision torchaudio cpuonly -c pytorch
If you're using singularity containers there is an example image definition file singularity.def that shows how you can install the package with cuda and pytorch.
Infer signatures for molecules
Instantiate a signaturizer for one of the 25 bioactivity spaces in the chemical checker:
from signaturizer3d import Signaturizer, CCSpace
CCSpace.print_spaces() # Prints a description of the 25 available spaces
signaturizer = Signaturizer(CCSpace.B4)
The first time you load a space it will download and cache the model weights locally.
Infer signaturers from a list of SMILES.
smiles_list = ['C', 'CCC', "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1" ]
signatures = signaturizer.infer_from_smiles(smiles_list)
print(signatures.shape) # -> (3, 128) a 128D vector per molecule
Infer signatures from an SDF file or a directory of SDF files by specifying a path.
signatures = signaturizer.infer_from_sdf("/path/to/file.sdf")
(Download the nicotine sdf file if you want an sdf file to test with)
See this notebook for more detailed examples of signaturizer usage.
For a more comprehensive example of using infered bioactivity signatures for analysing similarity between a set of compounds have a look at the example notebook in the original signaturizers package.
Development
Guidelines on how to set up the development environment and run tests.
Install dependencies locally with Poetry
Dependencies are managed via poetry. Poetry is a tool that does depencency management and packaging in Python, it allows you to declare the libraries your project depends on and it will manage (install/update) them for you. Poetry is also a handy tool for making virtual environment with the project dependencies. First, follow the documentation on how to install poetry here. Then using poetry, install the system dependencies with poetry by running this inside the project directory:
poetry install
This will create a virtual environment with the project dependencies. The virtual environment can be activated in your current shell by running
poetry shell
, or you can run any command inside the virtual environment by prefixing it with poetry run
fex you'd run the tests inside the
virtual environment with poetry run pytest
.
Pytorch is the only dependency not managed by poetry. This means it needs to be installed manually in the virtual environment.
Find the correct install command for your compute platform (CPU, GPU, ...) on this page.
Run the install command inside the poetry environment by prefixing it with poetry run
like this:
# Replace the install command if you're installing for a GPU
poetry run pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
The project has been tested with Pytorch 2.1
Run tests
Run all tests.
poetry run pytest
By default tests marked with performance
are excluded. These tests test runtime of different
components. This will be very system dependent, therefore they are excluded by default.
If you want to use these tests to monitor changes in performance as you change the code you
should run the tests on your system before making any changes and update the time threshold to
that before making any changes.
Run the performance
tests with:
poetry run pytest -m 'performance'
Documentation
For more information about the package and unimol check out the docs
Signaturizer3D use the Uni-Mol architecture and pre-trained weights for fine tuning. See the Uni-Mol article and repo for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for signaturizer3d-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c40633f182660f1ca5d1d591240418bfb6f6166c431504a46cd3735fa571512 |
|
MD5 | 7299f9a850a4037f404bd90f52be7ee5 |
|
BLAKE2b-256 | 45b48324399c09a8c89f578e6071cd4258feb3c10684d4c9c3e5fee4a586d0cb |