A library to quickly build QSAR models

These details have not been verified by PyPI

Project links

Source Code

Project description

Ersilia's LazyQSAR

A Python library for building supervised QSAR models quickly, with minimal configuration. LazyQSAR automates chemical descriptor computation, and model selection to produce robust models for property and activity prediction.

Two entry points:

LazyClassifierQSAR: pass SMILES strings directly; built-in descriptors are computed automatically
LazyClassifier: bring your own pre-computed descriptor arrays

Installation
Python API
CLI
How It Works
Base Models
Ersilia Model Hub integration
Disclaimer

Installation

We recommend installation from source:

git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
pip install -e .

The base install includes only lightweight runtime dependencies (numpy, onnxruntime, etc.), sufficient for loading and running pre-trained ONNX models without any ML and chemistry-related packages (RDKit). Therefore, the base install assumes descriptors are provided by the user.

You can install optional extras depending on your use case:

Extra	Command	Adds
`fit`	`pip install -e .[fit]`	Required to train models (scikit-learn, XGBoost, scipy, skl2onnx)
`descriptors`	`pip install -e .[descriptors]`	Required for built-in molecular descriptors (e.g. RDKit, FPSim2)
`all`	`pip install -e .[all]`	Everything above

The first time you use deep-learning descriptors (Chemeleon, CLAMP, CDDD), their checkpoints are downloaded automatically. To do this in advance:

lazyqsar setup --descriptors

Python API

LazyClassifierQSAR (SMILES)

Pass SMILES strings directly. Choose a descriptor mode:

Mode	Descriptors	Notes
`fast`	Morgan fingerprints	No deep-learning models, fastest
`slow`	Chemeleon, CLAMP, Morgan, RDKit (physchem), CDDD	Most thorough

from lazyqsar.qsar import LazyClassifierQSAR

model = LazyClassifierQSAR(mode="slow") # default is "slow"
model.fit(smiles_list=smiles_train, y=y_train)

Available prediction methods:

Method	Returns	Description
`predict(smiles_list)`	`(N,)`	Binary labels at an optimized threshold
`predict_proba(smiles_list)`	`(N, 2)`	Calibrated class probabilities
`predict_logit(smiles_list)`	`(N, 2)`	Log-odds scores
`predict_rank(smiles_list)`	`(N, 2)`	Rank quantiles (0–1)
`predict_score(smiles_list)`	`(N, 2)`	Raw model scores
`predict_lift(smiles_list)`	`(N, 2)`	Probability / population prior

LazyClassifier (custom descriptors)

Pass your own descriptor arrays or HDF5 files. We recommend the Ersilia Model Hub for descriptor computation — its .h5 output format is supported natively.

from lazyqsar.agnostic import LazyClassifier

# From a NumPy array
model = LazyClassifier()
model.fit(X=X_train, y=y_train)
y_hat = model.predict_proba(X=X_test)[:, 1]

# From an Ersilia .h5 file
model.fit(h5_file="descriptors.h5", y=y_train)
y_hat = model.predict_proba(h5_file="descriptors.h5")[:, 1]

The same prediction methods listed above are available, using X= instead of smiles_list=.

Saving and loading

Models are saved as ONNX files, so inference only requires numpy and onnxruntime, i.e. no scikit-learn or XGBoost at prediction time. Metadata is stored in JSON format.

To save models:

model.save(model_dir)          # directory
model.save("my_model.zip")     # or zip archive

And to load them:

model = LazyClassifierQSAR.load(model_dir)
y_hat = model.predict_proba(smiles_list=smiles_test)[:, 1]

model = LazyClassifier.load(model_dir)
y_hat = model.predict_proba(X=X_test)[:, 1]

For multi-endpoint prediction across multiple model directories, see Ersilia Model Hub integration.

CLI

All commands are available through the lazyqsar entry point.

Fit:

The --input directory must contain one CSV per task, with SMILES in the first column and binary labels (0/1) in the second column, with a header row.

lazyqsar fit --task classification --input $DATA_DIR --output $MODEL_DIR --mode slow

Pass --models_txt to train a subset of tasks (one CSV stem per line); without it, all CSVs in the directory are used.

Predict:

lazyqsar predict --input $INPUT_CSV --model $MODEL_DIR --output $OUTPUT_CSV [--models_txt FILE] [--predict_type TYPE]

The output CSV contains one column per task, ordered alphabetically by task name, or filtered and ordered by --models_txt at predict time. --predict_type controls the output format: proba (default), rank, logit, lift, score, or binary.

How it works

LazyQSAR builds an ensemble for each descriptor set through four steps:

Portfolio selection: the dataset is profiled (sample count, dimensionality, sparsity, class imbalance) and a rule-based selector decides which heads to train. The default portfolio is XGBoost + Random Forest; Linear Models and Support Vector Machines are added automatically for small, high-dimensional, or low-prevalence datasets.
Preprocessing: a scaler (StandardScaler, RobustScaler, MaxAbsScaler, or PowerTransformer) and an optional correlation-based feature reducer are selected automatically from dataset statistics.
Heads: each selected head is fitted on preprocessed features. For severely imbalanced datasets, balanced sub-batches are used and the batch predictions are averaged.
Pooling: head predictions are combined via a learned gating network (InnerClassifierPooler). When using LazyClassifierQSAR, a separate ensemble is trained per descriptor type and their predictions are combined via an AUC-weighted ensemble that accounts for per-sample prediction confidence.
Export: the full pipeline is exported to ONNX for dependency-free inference.

Base Models

The components under lazyqsar/base/ can be used independently of the full pipeline:

Module	Description
`lazyqsar.base.preprocessing`	Automatic scaler and feature reducer selection
`lazyqsar.base.xgboost`	Automatic XGBoost hyperparameter selection with portfolio comparison
`lazyqsar.base.linear`	Automatic linear model selection (logistic/ridge/SGD)
`lazyqsar.base.randomforest`	Random Forest classifier with zero-shot hyperparameter selection

Ersilia Model Hub integration

LazyQSAR models can be used inside an Ersilia Model Hub template. See eos1lb5 for an example.

Basically, lazyqsar fit can be used to produce a checkpoints folder with one sub-directory per task and per descriptor type:

checkpoints/
└── task1/
    ├── cddd/
    │   ├── featurizer.json
    │   ├── metadata.json
    │   └── batch_0/
    │       ├── preprocessor.onnx
    │       ├── xgboost.onnx
    │       └── pooler.json
    ├── chemeleon/   (same structure)
    ├── clamp/       (same structure)
    ├── morgan/      (same structure)
    └── rdkit/       (same structure)

The code/main.py inference script:

import os, sys
from lazyqsar.api.classifier_predict import predict

root = os.path.dirname(os.path.abspath(__file__))
checkpoints_dir = os.path.abspath(os.path.join(root, "..", "checkpoints"))
predict(model_dir=checkpoints_dir, input_csv=sys.argv[1], output_csv=sys.argv[2], predict_type="rank")

This function computes descriptors once per descriptor type and reuses them across all tasks, making it suitable for scoring large compound libraries. predict_type controls the output format and is available in both the Python API and the CLI (--predict_type).

Multi-model prediction across directories:

model_dir also accepts a dict[str, str] mapping each individual model directory (a leaf directory containing featurizer subdirs) to its exact output column name. This is useful when models for different targets are stored under separate paths:

from lazyqsar.api.classifier_predict import predict

predict(
    model_dir={
        "checkpoints/ecoli/individual_activity_a": "E. coli activity",
        "checkpoints/mtb/individual_activity_a":   "M. tb activity",
    },
    input_csv=sys.argv[1],
    output_csv=sys.argv[2],
    predict_type="rank",
)

The output CSV will contain one column per entry, named exactly as the dict values. Descriptors are still computed once per type and shared across all models.

Roadmap

We are currently working on regression models, mirroring what has been done for classification.

Disclaimer

This library is intended for quick QSAR modeling. For a more complete automated QSAR pipeline, refer to ZairaChem.

ZairaChem's version, with an earlier version of LazyQSAR, was presented in this article:

@article{Turon2023,
  author = {Turon, G. and Hlozek, J. and Woodland, J.G. and et al.},
  title = {First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa},
  journal = {Nat Commun},
  volume = {14},
  pages = {5736},
  year = {2023},
  doi = {10.1038/s41467-023-41512-2},
  url = {https://doi.org/10.1038/s41467-023-41512-2}
}

About the Ersilia Open Source Initiative

The Ersilia Open Source Initiative is a tech non-profit organization with the mission to equip laboratories, universities, and clinics in the Global South with AI/ML tools for infectious disease research. We work on the principles of open science, decolonized research, and egalitarian access to knowledge and research outputs. You can support Ersilia by clicking here.

Project details

These details have not been verified by PyPI

Project links

Source Code

Release history Release notifications | RSS feed

3.4.0

May 29, 2026

3.3.0

May 19, 2026

3.2.2

May 19, 2026

3.2.1

May 14, 2026

3.2.0

May 12, 2026

3.1.5

May 12, 2026

3.1.4

May 11, 2026

3.1.3

May 11, 2026

3.1.2

May 11, 2026

This version

3.1.1

May 7, 2026

3.1.0

May 7, 2026

3.0.1

Apr 23, 2026

2.4.0

Mar 19, 2026

2.3.0

Mar 5, 2026

2.2.2

Dec 15, 2025

2.2.1

Dec 15, 2025

2.2.0

Dec 11, 2025

2.1.6

Dec 10, 2025

2.1.5

Dec 10, 2025

2.1.4

Dec 9, 2025

2.1.3

Dec 9, 2025

2.1.2

Dec 1, 2025

2.1.1

Oct 28, 2025

2.1.0

Oct 27, 2025

2.0.2

Oct 9, 2025

1.0

Jul 23, 2025

0.4

Apr 8, 2024

0.3

Sep 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazyqsar-3.1.1.tar.gz (146.7 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazyqsar-3.1.1-py3-none-any.whl (176.0 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file lazyqsar-3.1.1.tar.gz.

File metadata

Download URL: lazyqsar-3.1.1.tar.gz
Upload date: May 7, 2026
Size: 146.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.0 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for lazyqsar-3.1.1.tar.gz
Algorithm	Hash digest
SHA256	`266e4418efec9e49a096caa7b85f9ff32dda08655d6361f3d085a48a465672e4`
MD5	`81041df8aa6a04e8ae2381fefc525a05`
BLAKE2b-256	`b0c0ef0807793ab6edaef959e7b76cadc5995d455da8164f6c61cae136907e86`

See more details on using hashes here.

File details

Details for the file lazyqsar-3.1.1-py3-none-any.whl.

File metadata

Download URL: lazyqsar-3.1.1-py3-none-any.whl
Upload date: May 7, 2026
Size: 176.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.0 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for lazyqsar-3.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d5f47b5041e3af7966ec0367483384c39031bad41192ab9b0143f1ea201d44a`
MD5	`a367b5f425b91935e2cd8cbbddd29408`
BLAKE2b-256	`a95a997a36514e47d0db352258210f84c22b26b4ddbe9b584d91c12d74c210de`

See more details on using hashes here.

lazyqsar 3.1.1

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

Ersilia's LazyQSAR

Table of Contents

Installation

Python API

LazyClassifierQSAR (SMILES)

LazyClassifier (custom descriptors)

Saving and loading

CLI

How it works

Base Models

Ersilia Model Hub integration

Roadmap

Disclaimer

About the Ersilia Open Source Initiative

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes