Skip to main content

A library to quickly build QSAR models

Project description

Ersilia's LazyQSAR

A library to build supervised QSAR models for chemistry quickly.

Installation

Install LazyQSAR from source:

git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .

To use the built-in LazyQSAR descriptors, install the optional dependencies:

python -m pip install -e .[descriptors]

This will enable descriptor (featurizer) calculation. The first time you run LazyQSAR, it will download the Chemeleon and CDDD model checkpoints. To complete this setup in advance, run:

lazyqsar-setup

Use as a Python API

Binary Classification

LazyQSAR's binary classifier can run either with built-in descriptors (takes SMILES as input) or with custom pre-computed descriptors.

Built-in descriptors

Instantiate LazyBinaryQSAR with a mode of choice:

Mode Descriptors used Speed
fast RDKit, Morgan fingerprints Fastest, no deep-learning descriptors
default Chemeleon, RDKit, CDDD Balanced
slow Chemeleon, Morgan, RDKit, CDDD Most thorough
from lazyqsar.qsar import LazyBinaryQSAR

model = LazyBinaryQSAR(mode="default")
model.fit(smiles_list=smiles_train, y=y_train)
y_hat = model.predict_proba(smiles_list=smiles_test)[:, 1]

Custom descriptors

Pre-calculate your own descriptors and pass them directly. We recommend the Ersilia Model Hub for this — its .h5 output format is supported natively. Alternatively, pass descriptors as a NumPy array.

from lazyqsar.agnostic import LazyBinaryClassifier

# From a NumPy array
model = LazyBinaryClassifier(mode="default")
model.fit(X=X_train, y=y_train)
y_hat = model.predict_proba(X=X_test)[:, 1]

# From an Ersilia .h5 file
model.fit(h5_file="descriptors.h5", y=y_train)
y_hat = model.predict_proba(h5_file="descriptors.h5")[:, 1]

Saving and loading models

Models are saved as ONNX files by default, so inference only requires the ONNX runtime.

# Save after training
model.save(model_dir)

# Load for inference (auto-detects ONNX or raw format)
from lazyqsar.agnostic import LazyBinaryClassifier

model = LazyBinaryClassifier.load(model_dir)
y_hat = model.predict_proba(X=X)[:, 1]

You can also save and load as a .zip archive:

model.save("my_model.zip")
model = LazyBinaryClassifier.load("my_model.zip")

The same save/load interface applies to LazyBinaryQSAR:

from lazyqsar.qsar import LazyBinaryQSAR

model = LazyBinaryQSAR(mode="default")
model.fit(smiles_list=smiles_train, y=y_train)
model.save(model_dir)

model = LazyBinaryQSAR.load(model_dir)
y_hat = model.predict_proba(smiles_list=smiles_test)[:, 1]

Tests and benchmarks

Quick testing

The tests/ folder contains scripts for quickly verifying that the code works. The Bioavailability dataset is used as an example.

python tests/test_binary_classification.py
python tests/test_binary_classification.py --agnostic

Benchmarking

The benchmark repository contains performance results for the default estimators and descriptors on the TDCommons ADMET dataset.

Use as a CLI

The CLI expects a data_dir containing one CSV file per task. Each CSV must have SMILES in the first column and binary labels (0/1) in the second column, with a header row.

Fit:

lazyqsar-binary-fit --data_dir $DATA_DIR --model_dir $MODEL_DIR --mode default

Optionally, pass a --models_txt file listing which tasks (CSV filenames without extension) to train, one per line. Without it, all CSVs in the directory are used.

lazyqsar-binary-fit --data_dir $DATA_DIR --model_dir $MODEL_DIR --models_txt models.txt

Predict:

lazyqsar-binary-predict --input_csv $INPUT_CSV --model_dir $MODEL_DIR --output_csv $OUTPUT_CSV

Disclaimer

This library is intended for quick QSAR modeling. For a more complete automated QSAR pipeline, refer to Zaira Chem.

About us

Learn about the Ersilia Open Source Initiative!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazyqsar-2.3.0.tar.gz (58.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazyqsar-2.3.0-py3-none-any.whl (78.6 kB view details)

Uploaded Python 3

File details

Details for the file lazyqsar-2.3.0.tar.gz.

File metadata

  • Download URL: lazyqsar-2.3.0.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for lazyqsar-2.3.0.tar.gz
Algorithm Hash digest
SHA256 5d028a396dcc4a09b984a4c50994edf5020327d0be8b254779f722108ffd9947
MD5 4cc33187360fbee2f239e1245ef9dd1c
BLAKE2b-256 6f6714d4a6f6a02c2229768d3cc1e44a8f1a95c26aa52ccafb4539c5d2cf92ac

See more details on using hashes here.

File details

Details for the file lazyqsar-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: lazyqsar-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 78.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for lazyqsar-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e11dc4034f6998fcfcb014fb50b38cb23225cfa28620137b1ef01d0788f77dc
MD5 db2c42f72b4e6e73b5f3ba2e06cc57b1
BLAKE2b-256 6fd5d1d0aad04c7605a3be4966bb82bc2abda9d67e5612194f48fe8b116b5eca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page