Smart consensus QSAR
Project description
QSARcons - smart searching for consensus of QSAR models
QSARcons is a package designed to identify optimal consensus combinations of QSAR models. The project is motivated by the large number of available chemical descriptors and machine learning methods, which can be combined into many different QSAR models. Selecting the most effective subset - and combining them into a consensus - can significantly improve prediction accuracy and robustness.
Motivation
1. Simple design - unlike many existing frameworks, QSARcons focuses on simplicity and ease of use. It minimizes the number of parameters a user must adjust, making QSAR model construction more accessible and intuitive.
2. Traditional QSAR - QSARcons includes a wide range of traditional molecular descriptors and machine learning algorithms, providing a transparent baseline for comparison with more advanced approaches like deep learning-based or complex QSAR workflows.
3. Universal workflow - QSARcons to be applied to any type of chemical property modeling.
Overview
QSARcons provides a two-layer workflow.
- 1. Model generation
Build multiple QSAR models (>100) using 2D chemical descriptors and traditional machine learning algorithms. The individual model building pipeline is kept simple, without advanced data preprocessing. Optional in-house stepwise hyperparameter optimization is available for all ML methods.
- 2. Consensus search
Identify the optimal subset of QSAR models using several search strategies:
Random search
Systematic search
Genetic search
Installation
pip install qsarcons
QSARcons benchmarking
QSARcons can be easily benchmarked against alternative approaches. For that, just call the default pipeline function below. Input data are dataframes where the first column is molecule SMILES and the second column is molecule property (regression or binary classification).
import polaris
from sklearn.model_selection import train_test_split
from qsarcons.cli import run_qsarcons
# Load Polaris benchmark
benchmark = polaris.load_benchmark("tdcommons/caco2-wang")
data_train, data_test = benchmark.get_train_test_split()
df_train, df_test = data_train.as_dataframe(), data_test.as_dataframe()
df_train, df_val = train_test_split(df_train, test_size=0.2, random_state=42)
# Run QSARcons
test_pred = run_qsarcons(df_train, df_val, df_test, task="regression", output_folder="results")
# Evaluate predictions
results = benchmark.evaluate(test_pred)
Colab
See an example in QSARcons pipeline .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qsarcons-1.1.1.tar.gz.
File metadata
- Download URL: qsarcons-1.1.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
359ff970c3d9cfedbb7b7b47f1745b6130de507b4a99617f04ef5a3d15ec665c
|
|
| MD5 |
707eddfd0a570c8a3805c486b364fcc6
|
|
| BLAKE2b-256 |
45c66f7423d8cc3319fc1d31e3d7bcebe0b3bca859c1c1b884c46b1c88256a0c
|
File details
Details for the file qsarcons-1.1.1-py3-none-any.whl.
File metadata
- Download URL: qsarcons-1.1.1-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60094a6efe27c61315058406e7816fc74da2650d46fd25abd118e34609fdd295
|
|
| MD5 |
09dc2c07680456c74748ed3b0bb01e04
|
|
| BLAKE2b-256 |
7aa0c1c81a15ee6c328286318c0e255d1e1c25469fba004929bacd098a88c4fc
|