A library to quickly build QSAR models
Project description
Ersilia's LazyQSAR
A library to build supervised models for chemistry fastly.
Installation
Install LazyQSAR from source:
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
To use the default Lazy QSAR descriptors, please install them:
python -m pip install -e .[descriptors]
This command will enable descriptors (featurizers) calculation. The first time you run LazyQSAR it, it will download the Chemeleon and the CDDD checkpoints, as well as install other dependencies. If you want to finalize this setup upfront, simply run:
lazyqsar-setup
Binary Classification
LazyQSAR's binary classifier can run either with default descriptors or with custom descriptors passed by the user.
Built-in descriptors
Instantiate the LazyBinaryQSAR class with a mode of choice (fast, default, slow):
from lazyqsar.qsar import LazyBinaryQSAR
model = LazyBinaryQSAR(mode="fast")
model.fit(smiles_list=smiles_train, y=y_train)
y_hat = model.predict_proba(smiles_list=smiles_test)[:,1]
Custom-made descriptors
Pre-calculate your descriptors using the preferred method. We recommend using the Ersilia Model Hub to that end. The .h5 format generated by Ersilia can be directly passed to the LazyQSAR pipeline. Alternatively, just pass the descriptors as an array in-memory.
from lazyqsar.agnostic import LazyBinaryClassifier
model = LazyBinaryClassifier()
model.fit(X=X_train, y=y_train)
y_hat = model.predict_proba(X=X_test)[:,1]
Using saved models at inference time
By default, models are saved as ONNX files. When a model is trained, you can simply load it using an artifact. In this case, the only crucial dependency is the ONNX runtime.
To save a model, simply run:
model.save(model_dir)
This will create a folder with ONNX files in it. You can use with the artifact.
from lazyqsar.artifacts import LazyBinaryClassifierArtifact
model = LazyBinaryClassifier.load(model_dir)
y_hat = model.predict_proba(X=X)[:,1]
Tests and benchmarks
Quick testing
In the /tests folder you can find a quick implementation of the methods described for easily checking that code is working. The Bioavailability dataset and Chemeleon descriptors are used as an example.
python test/test_binary_classification.py
python test/test_binary_classification.py --agnostic
Benchmarking
In the benchmark repository you will find the performance of the default estimators and descriptors on the TDCommons ADMET dataset. This is a provisional benchmark. The team is working on a more exhaustive one.
Disclaimer
This library is only intended for quick-and-dirty QSAR modeling. For a more complete automated QSAR modeling, please refer to Zaira Chem.
About us
Learn about the Ersilia Open Source Initiative!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazyqsar-2.2.1.tar.gz.
File metadata
- Download URL: lazyqsar-2.2.1.tar.gz
- Upload date:
- Size: 54.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2cfb8967ffa8a71ce1153a62a87ac57017442e7239945dfc84e3545a680ae16
|
|
| MD5 |
60f6202bef8aced04a45ca092f25a256
|
|
| BLAKE2b-256 |
8084799c2b368c2eeb85f16176133da78e38dc9d0802e147b2f41aa62bdad1c8
|
File details
Details for the file lazyqsar-2.2.1-py3-none-any.whl.
File metadata
- Download URL: lazyqsar-2.2.1-py3-none-any.whl
- Upload date:
- Size: 72.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0e09c32eb4f67035bea570314b0ddbc707dbbdbf6cef1e87df401e40d6a6584
|
|
| MD5 |
bd56a7dcb706a1b840698b1ea336511d
|
|
| BLAKE2b-256 |
b9ddbd4eee2b831ffed15e72dd5be20dfbbf3718228a91890db3cb6695294464
|