A modern, extendible platform for ligand-based virtual screening benchmarking. Automates data downloading, preprocessing, ML training, evaluation, and statistical analysis. Supports multiple datasets, molecular embeddings (fingerprints and pretrained neural networks), similarity search, and ML classifiers for identifying bioactive compounds.

These details have not been verified by PyPI

Project description

Litmus

Ligand-based virtual screening is one of the most popular chemoinformatics approaches to screening for bioactive compounds in large molecular databases. It utilizes molecule vectorization approaches and machine learning (ML) models to identify potential bioactive molecules for novel problems. It poses a significant challenge due to extreme data imbalance (often <1% positive class), large datasets, and frequency of false positives. There is a pressing need for modern benchmark for evaluating algorithms in this area, with the most popular software published in 2013 and based on Python 2. The created platform should have an extendible, modular structure, allowing usage of many benchmarking datasets, molecular embedding algorithms, and evaluation methods. In particular, it should automate the data downloading, preprocessing, training of ML algorithms, evaluation, and statistical analysis of results. Practical tests of created software should include benchmarking multiple molecular fingerprints. Embedding from pretrained neural networks can also be analyzed. Bioactive search should support both efficient similarity searching based on appropriate distance metrics and ML classifiers like logistic regression and boosted ensembles.

A comprehensive platform for benchmarking ligand-based virtual screening methods with scikit-learn compatible interfaces.

Features

Standardized benchmarking protocols for virtual screening methods
Scikit-learn compatible API
Built-in dataset loaders and metrics
Comprehensive documentation and examples

Glossary

Dataset A single protein-ligand pairing task, consisting of one target protein (y) and a set of associated ligands (X). It represents one learning or prediction unit, typically used in structure-based drug discovery workflows.
Benchmark A curated collection of datasets proposed in the literature to evaluate models across a range of protein-ligand tasks. Benchmarks often represent a standard suite of datasets used to compare algorithmic performance under consistent conditions.
Platform An overarching collection of datasets, possibly post-processed (e.g., deduplicated, filtered) to meet certain criteria such as target uniqueness. A platform is designed to serve as a consistent, reusable foundation for large-scale experimentation and model development.

Available datasets

All datasets are hosted on HuggingFace Hub and are automatically downloaded when used:

MUV - 17 targets, Parquet format
LIT-PCBA - 15 targets, Parquet format
WelQrate - 9 targets, Parquet format, 5 seeds for cross-validation
DUD-AD - 55 targets, CSV format

Datasets are available at: https://huggingface.co/datasets/scikit-fingerprints/litmus

How to use this benchmark?

All datasets are automatically downloaded from HuggingFace Hub when first used. The example below demonstrates how to use the Benchmark class to evaluate the performance of an embedding method. In this case, we will benchmark the ECFPFingerprint from the skfp library.

Note: For WelQrate dataset, the benchmark automatically runs cross-validation across all 5 seeds and reports both per-seed results and averaged results.

from skfp.fingerprints import ECFPFingerprint
from skfp.preprocessing import MolFromSmilesTransformer
from sklearn.pipeline import make_pipeline

from lbvslitmus.benchmarking.benchmark import Benchmark

# Define a pipeline that transforms SMILES strings into numerical vectors.
# The pipeline must expose a `fit_transform` method.
pipeline = make_pipeline(
   MolFromSmilesTransformer(suppress_warnings=True), ECFPFingerprint()
)

# Create a Benchmark object and provide the pipline to the benchmark
benchmark = Benchmark(pipeline=pipeline)

# Run the benchmark
results = benchmark.run()

# Print a summary of the results
print(results.summary)

Dataset Splitting

Litmus provides advanced dataset splitting capabilities for virtual screening experiments, ensuring proper train/test separation while maintaining class balance and molecular diversity.

Max-Min Splitting Algorithm

The platform implements a max-min splitting algorithm that:

Separates active and inactive compounds independently
Maintains class balance in both train and test sets
Ensures molecular diversity through maximum dissimilarity selection
Supports reproducible splits with configurable random seeds
Handles extreme class imbalance common in virtual screening datasets

Working with Dataset Splits

Loading Pre-generated Splits

All datasets come with pre-generated train/test splits that are automatically downloaded from HuggingFace Hub. These splits are generated using the max-min algorithm to ensure molecular diversity and class balance.

Access pre-generated splits for reproducible experiments:

from lbvslitmus.datasets import LITPCBADownloader, MUVDownloader, WelQrateDownloader

# Load full dataset
lit_pcba_downloader = LITPCBADownloader()
df = lit_pcba_downloader.load(target="OPRK1")

# Get split indices (automatically downloaded from HuggingFace)
splits = lit_pcba_downloader.get_splits(target="OPRK1")

# Create train and test subsets
train_df = df.iloc[splits["train"]].copy()
test_df = df.iloc[splits["test"]].copy()

# For WelQrate, you can specify a seed for cross-validation (1-5)
welqrate_downloader = WelQrateDownloader()
df = welqrate_downloader.load(target="AID2258")
splits = welqrate_downloader.get_splits(target="AID2258", seed=1)  # Use seed 1

Custom Split Generation (Not Recommended)

⚠️ Warning: Generating custom splits is not recommended for reproducible research. Use pre-generated splits whenever possible to ensure consistent benchmarking across studies.

However, if you need custom splits for specific research requirements:

from lbvslitmus.model_selection.splitters.maxmin_split import maxmin_train_test_split

# Load your dataset
df = lit_pcba_downloader.load(target="OPRK1")

# Separate active and inactive compounds
active_smiles = df[df["OPRK1"] == 1]["SMILES"].tolist()
inactive_smiles = df[df["OPRK1"] == 0]["SMILES"].tolist()

# Generate custom splits with different parameters
train_active, test_active = maxmin_train_test_split(
   data=active_smiles,
   train_size=0.8,  # 80% for training (different from default 75%)
   random_state=123,  # Custom random seed
   show_progress=True
)

train_inactive, test_inactive = maxmin_train_test_split(
   data=inactive_smiles,
   train_size=0.8,
   random_state=123,
   show_progress=True
)

# Combine indices
train_idx = train_active + train_inactive
test_idx = test_active + test_inactive

# Save custom splits (optional)
import numpy as np

np.save("custom_train_idx.npy", train_idx)
np.save("custom_test_idx.npy", test_idx)

Important considerations for custom splits:

Document your splitting parameters and random seed
Ensure class balance is maintained
Consider molecular diversity in your split
Validate split quality before proceeding with experiments

Supported File Formats

Parquet: High-performance columnar format (MUV, LIT-PCBA, WelQrate)
CSV: Standard comma-separated values format (DUD-AD)
NumPy arrays: All splits are provided as .npy files with train/test indices

All datasets and splits are automatically downloaded from HuggingFace Hub when first accessed.

Example Scripts

Check out the example scripts in the examples/ directory:

work_with_dataset_splits.py: Comprehensive example of working with dataset splits
download_muv_dataset.py: Example of downloading and using MUV dataset
download_lit_pcba_dataset.py: Example of downloading and using LIT-PCBA dataset
download_welqrate_dataset.py: Example of downloading and using WelQrate dataset with seeds
benchmarking_example.py: Example of running benchmarks (includes WelQrate cross-validation)
compare_with_baselines_example.py: Bayesian statistical comparison of molecular fingerprints with baselines

Bayesian Fingerprint Comparison

Litmus provides a comprehensive framework for statistically rigorous comparison of molecular fingerprints using Bayesian methods. This enables you to determine whether differences between fingerprints are meaningful or negligible.

Quick Start:

from lbvslitmus.benchmarking import Benchmark, BaselineLoader
from lbvslitmus.comparison import BaselineComparator

# Run benchmark with your fingerprint
benchmark = Benchmark(pipeline=your_pipeline, benchmarks=["MUV"])
results = benchmark.run()

# Compare with baseline fingerprints
comparison = results.compare_with_baseline(
    baseline=["ECFP4", "MACCS", "AtomPair"],
    metric="AUROC",
    fingerprint_name="MyFingerprint",
    control_fingerprint="MyFingerprint",  # Compare all vs yours
)

# Generate visualizations
comparison.plot(output_dir="plots/comparison")

Key Features:

Pairwise Bayesian comparisons using Bradley-Terry model
ROPE (Region of Practical Equivalence) framework for meaningful differences
Support for multiple metrics (AUROC, AUPRC, BEDROC, etc.)
Automatic generation of heatmaps and win count visualizations
Configurable MCMC sampling parameters
Pre-computed baseline results for standard fingerprints (ECFP4, MACCS, etc.)

Available Baseline Fingerprints: All baseline results are hosted on HuggingFace Hub and automatically downloaded:

ECFP4, ECFP4_Count, ECFP6, ECFP6_Count, ECFP8, ECFP8_Count
MACCS, AtomPair, PubChem, TopologicalTorsion

See examples/compare_with_baselines_example.py for a complete example.

Visualization Guide

Litmus provides comprehensive visualization tools for analyzing and interpreting benchmark results. This section describes each chart type, what it shows, and how to interpret the results.

Quick Start

from lbvslitmus.visualization import plot_all, plot_model_comparison, plot_top_targets

# Generate all plots at once
plot_all(results, output_dir="plots")

# Or generate specific plots
plot_model_comparison(results, output_dir="plots")
plot_top_targets(results, n_targets=15, output_dir="plots")

Available Charts

1. Model Comparison (`model_comparison.png`)

What it shows: Boxplots comparing the distribution of metric scores (AUROC, AUPRC, BEDROC) across different models.

How to interpret:

Each subplot represents one metric (e.g., AUROC, AUPRC, BEDROC)
The box shows the interquartile range (IQR) - middle 50% of scores
The horizontal line inside the box is the median score
Whiskers extend to 1.5× IQR; points beyond are outliers
Mean values are annotated on top of each box
Higher and tighter boxes indicate better and more consistent performance
Compare boxes side-by-side to see which model performs better overall

Use case: Quick comparison of overall model performance across all targets and datasets.

2. Benchmark Performance (`benchmark_performance.png`)

What it shows: Grouped bar charts showing average metric scores for each model across different benchmarks/datasets.

How to interpret:

Each subplot represents one metric (AUROC, AUPRC)
X-axis shows different benchmarks (MUV, LIT-PCBA, DUD-AD, WELQRATE)
Y-axis shows the average score
Bars are grouped by model, using consistent colors
Taller bars indicate better performance on that benchmark
Compare bar heights within each benchmark to see relative model performance

Use case: Understanding how models perform on different types of datasets and identifying dataset-specific strengths/weaknesses.

3. Benchmark Heatmap (`benchmark_heatmap.png`)

What it shows: Heatmaps displaying average scores for each benchmark-metric combination, with separate heatmaps per model.

How to interpret:

Rows represent benchmarks (datasets)
Columns represent different metrics
Color intensity indicates score magnitude (darker = higher for YlOrRd colormap)
Numerical values are annotated in each cell
Look for patterns: consistent high scores across metrics indicate robust performance
Red/orange cells indicate strong performance; yellow cells indicate weaker performance

Use case: Comprehensive overview of model performance across all dimensions; identifying which metrics a model excels at.

4. Violin Grid (`violin_grid.png`)

What it shows: A grid of violin plots where rows are datasets and columns are metrics, showing the full distribution of scores.

How to interpret:

Each cell shows the score distribution for one dataset-metric combination
Violin width indicates density of scores at that value (wider = more common)
Red diamond markers show the mean value
Inner points show individual data points
Wider violins at high values indicate consistently good performance
Bimodal distributions (two peaks) may indicate different behavior on different targets

Use case: Detailed analysis of score distributions; understanding variance and identifying potential outliers or subgroups.

5. Metric Bars (`metric_bars.png`)

What it shows: Grouped bar charts for each benchmark, showing average scores for different metrics side-by-side.

How to interpret:

Each subplot represents one benchmark/dataset
X-axis shows different metrics
Bars are grouped by model
Compare bar heights to see which model performs better on each metric
Compare across subplots to see how the same model-metric combination varies by dataset

Use case: Detailed metric-by-metric comparison within each benchmark.

6. Distribution Violins (`distribution_violins.png`)

What it shows: A 2×2 grid of violin plots showing how metric scores are distributed across different benchmarks.

How to interpret:

Each subplot shows one metric (AUROC, AUPRC, BEDROC, RIE)
X-axis shows different benchmarks
Split violins (when 2 models) or grouped violins show model comparison
Overlapping violins indicate similar performance
Non-overlapping violins indicate significant performance differences
Long tails indicate high variance in performance

Use case: Comparing model performance distributions across benchmarks; statistical significance assessment.

7. Enrichment Factors (`enrichment_factors.png`)

What it shows: Bar chart showing Enrichment Factor (EF) values at 1% and 5% thresholds for each benchmark-model combination.

How to interpret:

EF measures how much better than random the model is at ranking actives
EF 1%: How enriched are actives in the top 1% of ranked compounds
EF 5%: How enriched are actives in the top 5% of ranked compounds
EF = 1 means random performance; EF = 10 means 10× better than random
Higher bars indicate better early enrichment (more actives found early)
EF 1% is more stringent; EF 5% captures broader early enrichment

Use case: Virtual screening applications where only top-ranked compounds will be tested experimentally.

8. Metric Correlation (`metric_correlation.png`)

What it shows: Correlation heatmaps showing how different metrics correlate with each other, with separate heatmaps per model.

How to interpret:

Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation)
Red cells indicate positive correlation (metrics move together)
Blue cells indicate negative correlation (metrics move oppositely)
Diagonal is always 1.0 (perfect self-correlation)
High correlations between metrics suggest they measure similar aspects
Low correlations indicate the metrics capture different information

Use case: Understanding metric relationships; identifying redundant metrics; validating that chosen metrics provide diverse evaluation.

9. Metric Histograms (`metric_histograms.png`)

What it shows: Overlapping histograms with kernel density estimation (KDE) curves showing the frequency distribution of metric scores.

How to interpret:

X-axis shows score values
Y-axis shows frequency (count of targets with that score)
Bars show histogram bins; curves show smoothed density estimates
Non-overlapping distributions indicate clear performance differences between models
Right-shifted distributions indicate overall better performance
Narrow, tall peaks indicate consistent performance
Wide, flat distributions indicate high variance

Use case: Understanding the overall score distribution; identifying whether differences are statistically meaningful.

10. Top Targets (`top_targets.png`)

What it shows: Horizontal bar charts showing the top N best-performing targets for each metric, with target labels including the dataset name.

How to interpret:

Y-axis shows target names with dataset in parentheses, e.g., "OPRK1 (LIT-PCBA)"
X-axis shows the score value
Bars are grouped by model for each target
Longer bars indicate higher scores
Similar bar lengths for both models indicate comparable performance on that target
Large differences highlight targets where one model significantly outperforms the other

Use case: Identifying which specific protein targets are easiest/hardest to predict; finding model-specific strengths on particular targets.

Comparison Visualizations

In addition to benchmark result visualizations, Litmus provides specialized visualizations for fingerprint comparison results: Example:

from lbvslitmus.comparison import plot_comparison_heatmap, plot_win_counts

# Generate comparison visualizations
comparison = results.compare_with_baseline(
    baseline=["ECFP4", "MACCS", "AtomPair"],
    metric="AUROC",
    fingerprint_name="MyFP",
    control_fingerprint="MyFP"
)

# Generate all visualizations at once
comparison.plot(output_dir="plots/comparison")

Customizing Plots

All plot functions accept the following common parameters:

Parameter	Default	Description
`output_dir`	`"plots"`	Directory to save the plot
`style`	`"whitegrid"`	Seaborn style (`whitegrid`, `darkgrid`, `white`, `dark`)
`figure_dpi`	`100`	DPI for figure display
`savefig_dpi`	`300`	DPI for saved figures (higher = better quality)
`font_size`	`10`	Base font size for labels

Some plots have additional parameters:

# Customize number of top targets
plot_top_targets(results, n_targets=20)

# Customize histogram bins
plot_metric_histograms(results, bins=20)

# Customize which metrics to show
plot_violin_grid(results, metrics=["AUROC", "AUPRC"])

Color Consistency

All plots use a consistent color palette (MODEL_PALETTE) to ensure models are represented with the same colors across all visualizations:

Model 1: Teal (#66c2a5)
Model 2: Orange (#fc8d62)
Model 3: Blue (#8da0cb)
Model 4: Pink (#e78ac3)
Model 5: Green (#a6d854)

This consistency makes it easy to track model performance across different chart types.

Installation

Install the latest stable release from PyPI:

pip install LBVSLitmus

Development

Setup

To install from source for development:

Clone the repository:

git clone https://github.com/your-org/virtual_screening_platform.git
cd virtual_screening_platform

Install uv using official guide
Install dependencies:
```
uv sync
```
Install pre-commit hooks:
```
uv run pre-commit install
```

Code Quality

This project uses Ruff for code formatting and linting. To format your code:

# Format and lint all files
uv run ruff format .
uv run ruff check . --fix

# Format and lint specific files
uv run ruff format path/to/file.py
uv run ruff check path/to/file.py --fix

The code formatting will also run automatically when you commit changes, thanks to pre-commit hooks.

Running Tests

To run the tests:

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_specific.py

# Run tests with coverage report
uv run pytest --cov=lbvslitmus --cov-report=term-missing

License

MIT License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.1

Feb 12, 2026

This version

1.1.0

Jan 10, 2026

1.0.1

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lbvslitmus-1.1.0.tar.gz (61.0 kB view details)

Uploaded Jan 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lbvslitmus-1.1.0-py3-none-any.whl (89.1 kB view details)

Uploaded Jan 10, 2026 Python 3

File details

Details for the file lbvslitmus-1.1.0.tar.gz.

File metadata

Download URL: lbvslitmus-1.1.0.tar.gz
Upload date: Jan 10, 2026
Size: 61.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lbvslitmus-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aba896561c80b6a0894fd8b70d8650ddd90bd729281b780dfdc0dd5b10c09c39`
MD5	`63e709dfddcef425d8baff19876b1ed8`
BLAKE2b-256	`3a968a7a387c6f8a1b5b0d208342507f61bc13877465767003713676cce02586`

See more details on using hashes here.

File details

Details for the file lbvslitmus-1.1.0-py3-none-any.whl.

File metadata

Download URL: lbvslitmus-1.1.0-py3-none-any.whl
Upload date: Jan 10, 2026
Size: 89.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lbvslitmus-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`312afc7dcc118b21807a9cceca84b4b744cd4831282b115bc2d7c615def77f12`
MD5	`5e5ae27d0124d49ce025cd43fe855126`
BLAKE2b-256	`fbab7afd65d8cba0ce707fd89e5737673419cfac9b7b95f7ea93fb14a7f271bc`

See more details on using hashes here.

lbvslitmus 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Litmus

Features

Glossary

Available datasets

How to use this benchmark?

Dataset Splitting

Max-Min Splitting Algorithm

Working with Dataset Splits

Loading Pre-generated Splits

Custom Split Generation (Not Recommended)

Supported File Formats

Example Scripts

Bayesian Fingerprint Comparison

Visualization Guide

Quick Start

Available Charts

1. Model Comparison (model_comparison.png)

2. Benchmark Performance (benchmark_performance.png)

3. Benchmark Heatmap (benchmark_heatmap.png)

4. Violin Grid (violin_grid.png)

5. Metric Bars (metric_bars.png)

6. Distribution Violins (distribution_violins.png)

7. Enrichment Factors (enrichment_factors.png)

8. Metric Correlation (metric_correlation.png)

9. Metric Histograms (metric_histograms.png)

10. Top Targets (top_targets.png)

Comparison Visualizations

Customizing Plots

Color Consistency

Installation

Development

Setup

Code Quality

Running Tests

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Model Comparison (`model_comparison.png`)

2. Benchmark Performance (`benchmark_performance.png`)

3. Benchmark Heatmap (`benchmark_heatmap.png`)

4. Violin Grid (`violin_grid.png`)

5. Metric Bars (`metric_bars.png`)

6. Distribution Violins (`distribution_violins.png`)

7. Enrichment Factors (`enrichment_factors.png`)

8. Metric Correlation (`metric_correlation.png`)

9. Metric Histograms (`metric_histograms.png`)

10. Top Targets (`top_targets.png`)