BenchAudit -- data hygiene and similarity audits for molecular and DTI benchmarks.

These details have not been verified by PyPI

Project description

BenchAudit

BenchAudit is a lightweight pipeline for auditing molecular property and drug–target interaction benchmarks. It standardizes SMILES strings, checks split hygiene, surfaces label conflicts and activity cliffs, and can run simple baseline models. Outputs are machine‑readable summaries and drill‑down tables you can inspect or feed into other tools.

Features

Config‑driven analysis of tabular, TDC, Polaris, and DTI datasets.
SMILES standardization with optional REOS alerts and configurable fingerprint settings.
Split hygiene reports: duplicates, cross‑split contamination, and nearest‑neighbor similarity.
Conflict and activity‑cliff detection for classification and regression tasks.
DTI extras: sequence normalization, cross‑split pair conflicts, and EMBOSS stretcher alignment summaries.
Optional simple baselines for quick performance sanity checks.

Installation

From PyPI

Install the published package:

pip install benchaudit

or with uv:

uv pip install benchaudit

From source with `uv`

BenchAudit uses a standard pyproject.toml. The quickest source setup is with uv:

# 1) Create a virtual environment
uv venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

# 2) Install dependencies declared in pyproject.toml
uv sync

If you need optional sequence alignment support, install EMBOSS so stretcher is available (e.g., sudo apt install emboss on Debian/Ubuntu).

Automated PyPI publishing

This repo includes .github/workflows/publish-pypi.yml for automated releases.

In PyPI, configure a Trusted Publisher for this GitHub repository and workflow file (.github/workflows/publish-pypi.yml), using environment pypi.
Bump project.version in pyproject.toml.
Create and push a tag vX.Y.Z matching that version (for example v0.1.1).
GitHub Actions builds with uv build and publishes to PyPI automatically when the repository visibility is public (publishing is skipped while private).

Detailed release and install documentation: docs/publishing_and_installation.md

References

Package on PyPI: https://pypi.org/project/benchaudit/
Publish workflow: .github/workflows/publish-pypi.yml
CI workflow: .github/workflows/ci.yml
uv docs: https://docs.astral.sh/uv/
PyPI Trusted Publishers: https://docs.pypi.org/trusted-publishers/

Usage

The main entry point is run.py, which consumes one or more YAML configs and writes results under runs/ by default. After uv sync, you can call it via uv run python run.py ... or the installed console scripts:

uv run benchaudit ... (primary)
uv run bench ... (legacy alias)

# Analyze all configs in a folder
uv run python run.py --configs configs --out-root runs
# or: uv run benchaudit --configs configs --out-root runs

# Analyze a single config and train baselines
uv run python run.py --config configs/example.yml --benchmark
# or: uv run benchaudit --config configs/example.yml --benchmark

Outputs per config:

summary.json: split sizes, hygiene counts, similarity and conflict statistics.
records.csv: per-row view with cleaned SMILES, labels, and split tags.
conflicts.jsonl: detailed conflict rows.
cliffs.jsonl: detailed activity cliff rows.
sequence_alignments.jsonl: (DTI only) top alignments between splits.
performance.json: (when --benchmark) baseline model metrics and predictions.

Project layout

run.py: CLI runner that loads configs, builds loaders/analyzers, and writes artifacts.
utils/: loaders, analyzers, baseline helpers, and logging utilities.
configs/: example YAML configurations for supported datasets.
data/, runs/: expected data and output locations (not tracked).

Development

Code style: keep changes simple, PEP 8-ish. Add short docstrings for public functions.
Typing: prefer explicit, lightweight type hints when types are clear.
Tests: run python -m unittest discover -s tests -p "test_*.py" (or pytest tests if pytest is installed).
Test data: tiny dummy benchmark datasets live under tests/data/.
Benchmark/analysis docs: run python scripts/generate_benchmark_analysis_class_docs.py --output docs/benchmark_and_analysis_class_reference.md to regenerate the class reference; CI enforces freshness via .github/workflows/benchmark-analysis-docs.yml.
Optional extras: Polaris datasets require polaris-lib; sequence alignment requires pairwise-sequence-alignment and EMBOSS binaries.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Mar 3, 2026

This version

0.1.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchaudit-0.1.0.tar.gz (55.1 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

benchaudit-0.1.0-py3-none-any.whl (52.5 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file benchaudit-0.1.0.tar.gz.

File metadata

Download URL: benchaudit-0.1.0.tar.gz
Upload date: Mar 2, 2026
Size: 55.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for benchaudit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dd5b22002de6db64196c11adf4e115c78fff1b561c22b6456eb1ea66c0644add`
MD5	`b217a7e166fdbdd39351fb3ac46f8540`
BLAKE2b-256	`1682a9b8677967bde3247a214dd65d3d61f7c6867f120c75f2ade26f36aa2a94`

See more details on using hashes here.

Provenance

The following attestation bundles were made for benchaudit-0.1.0.tar.gz:

Publisher: publish-pypi.yml on sieber-lab/benchaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: benchaudit-0.1.0.tar.gz
- Subject digest: dd5b22002de6db64196c11adf4e115c78fff1b561c22b6456eb1ea66c0644add
- Sigstore transparency entry: 1009342488
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: sieber-lab/benchaudit@5995e5320a0157fbdf1dfb151bb17355c65af724
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sieber-lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@5995e5320a0157fbdf1dfb151bb17355c65af724
- Trigger Event: workflow_dispatch

File details

Details for the file benchaudit-0.1.0-py3-none-any.whl.

File metadata

Download URL: benchaudit-0.1.0-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 52.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for benchaudit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07b710336ca97f0483cde18df3c89a7bc74417aa47c31f105bd13c141ad3628f`
MD5	`2877e56ab057eb6f57eda74657a0029a`
BLAKE2b-256	`dfd0d18c2bd1c74b9e65bb9ad15d49b91b3c1607a85b67f01762483eff194432`

See more details on using hashes here.

Provenance

The following attestation bundles were made for benchaudit-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on sieber-lab/benchaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: benchaudit-0.1.0-py3-none-any.whl
- Subject digest: 07b710336ca97f0483cde18df3c89a7bc74417aa47c31f105bd13c141ad3628f
- Sigstore transparency entry: 1009342500
- Sigstore integration time: Mar 2, 2026
Source repository:
- Permalink: sieber-lab/benchaudit@5995e5320a0157fbdf1dfb151bb17355c65af724
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sieber-lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@5995e5320a0157fbdf1dfb151bb17355c65af724
- Trigger Event: workflow_dispatch

benchaudit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

BenchAudit

Features

Installation

From PyPI

From source with `uv`

Automated PyPI publishing

References

Usage

Project layout

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

benchaudit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

BenchAudit

Features

Installation

From PyPI

From source with uv

Automated PyPI publishing

References

Usage

Project layout

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

From source with `uv`