A library to quickly build QSAR models
Project description
Ersilia's LazyQSAR
A library to build supervised QSAR models for chemistry quickly.
Installation
Install LazyQSAR from source:
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
To use the built-in LazyQSAR descriptors, install the optional dependencies:
python -m pip install -e .[descriptors]
This will enable descriptor (featurizer) calculation. The first time you run LazyQSAR, it will download the Chemeleon and CDDD model checkpoints. To complete this setup in advance, run:
lazyqsar-setup
Use as a Python API
Binary Classification
LazyQSAR's binary classifier can run either with built-in descriptors (takes SMILES as input) or with custom pre-computed descriptors.
Built-in descriptors
Instantiate LazyBinaryQSAR with a mode of choice:
| Mode | Descriptors used | Speed |
|---|---|---|
fast |
RDKit, Morgan fingerprints | Fastest, no deep-learning descriptors |
default |
Chemeleon, RDKit, CDDD | Balanced |
slow |
Chemeleon, Morgan, RDKit, CDDD | Most thorough |
from lazyqsar.qsar import LazyBinaryQSAR
model = LazyBinaryQSAR(mode="default")
model.fit(smiles_list=smiles_train, y=y_train)
y_hat = model.predict_proba(smiles_list=smiles_test)[:, 1]
Custom descriptors
Pre-calculate your own descriptors and pass them directly. We recommend the Ersilia Model Hub for this — its .h5 output format is supported natively. Alternatively, pass descriptors as a NumPy array.
from lazyqsar.agnostic import LazyBinaryClassifier
# From a NumPy array
model = LazyBinaryClassifier(mode="default")
model.fit(X=X_train, y=y_train)
y_hat = model.predict_proba(X=X_test)[:, 1]
# From an Ersilia .h5 file
model.fit(h5_file="descriptors.h5", y=y_train)
y_hat = model.predict_proba(h5_file="descriptors.h5")[:, 1]
Saving and loading models
Models are saved as ONNX files by default, so inference only requires the ONNX runtime.
# Save after training
model.save(model_dir)
# Load for inference (auto-detects ONNX or raw format)
from lazyqsar.agnostic import LazyBinaryClassifier
model = LazyBinaryClassifier.load(model_dir)
y_hat = model.predict_proba(X=X)[:, 1]
You can also save and load as a .zip archive:
model.save("my_model.zip")
model = LazyBinaryClassifier.load("my_model.zip")
The same save/load interface applies to LazyBinaryQSAR:
from lazyqsar.qsar import LazyBinaryQSAR
model = LazyBinaryQSAR(mode="default")
model.fit(smiles_list=smiles_train, y=y_train)
model.save(model_dir)
model = LazyBinaryQSAR.load(model_dir)
y_hat = model.predict_proba(smiles_list=smiles_test)[:, 1]
Tests and benchmarks
Quick testing
The tests/ folder contains scripts for quickly verifying that the code works. The Bioavailability dataset is used as an example.
python tests/test_binary_classification.py
python tests/test_binary_classification.py --agnostic
Benchmarking
The benchmark repository contains performance results for the default estimators and descriptors on the TDCommons ADMET dataset.
Use as a CLI
The CLI expects a data_dir containing one CSV file per task. Each CSV must have SMILES in the first column and binary labels (0/1) in the second column, with a header row.
Fit:
lazyqsar-binary-fit --data_dir $DATA_DIR --model_dir $MODEL_DIR --mode default
Optionally, pass a --models_txt file listing which tasks (CSV filenames without extension) to train, one per line. Without it, all CSVs in the directory are used.
lazyqsar-binary-fit --data_dir $DATA_DIR --model_dir $MODEL_DIR --models_txt models.txt
Predict:
lazyqsar-binary-predict --input_csv $INPUT_CSV --model_dir $MODEL_DIR --output_csv $OUTPUT_CSV
Disclaimer
This library is intended for quick QSAR modeling. For a more complete automated QSAR pipeline, refer to Zaira Chem.
About us
Learn about the Ersilia Open Source Initiative!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazyqsar-2.3.0.tar.gz.
File metadata
- Download URL: lazyqsar-2.3.0.tar.gz
- Upload date:
- Size: 58.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d028a396dcc4a09b984a4c50994edf5020327d0be8b254779f722108ffd9947
|
|
| MD5 |
4cc33187360fbee2f239e1245ef9dd1c
|
|
| BLAKE2b-256 |
6f6714d4a6f6a02c2229768d3cc1e44a8f1a95c26aa52ccafb4539c5d2cf92ac
|
File details
Details for the file lazyqsar-2.3.0-py3-none-any.whl.
File metadata
- Download URL: lazyqsar-2.3.0-py3-none-any.whl
- Upload date:
- Size: 78.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e11dc4034f6998fcfcb014fb50b38cb23225cfa28620137b1ef01d0788f77dc
|
|
| MD5 |
db2c42f72b4e6e73b5f3ba2e06cc57b1
|
|
| BLAKE2b-256 |
6fd5d1d0aad04c7605a3be4966bb82bc2abda9d67e5612194f48fe8b116b5eca
|