Skip to main content

Leakage checks for machine-learning pipelines using permutation tests.

Project description

Leakly logo

PyPI Build License

Open in Google Colab

Leakly: Leakage checks for any machine-learning pipeline

Leakly uses label permutation to test whether a machine-learning pipeline performs above chance when no true signal is present.

Above-chance performance after permutation may indicate leakage from preprocessing, feature selection, tuning, or another step of the pipeline.

How it works

  1. Permute labels to remove the real feature-label association.
  2. Run the full pipeline exactly as in the original analysis.
  3. Compare the permuted score distribution with chance level.
  4. Above-chance permuted performance suggests possible leakage.

Leakly includes example configurations for a leaky pipeline and a non-leaky pipeline so users can inspect the effect directly.

Example permutation AUC summary

Install

pip install Leakly

Quick Start on Colab: Open example.ipynb in Colab

Key Python snippet

from leakly import (
    MLPipeline,
    SummaryPlotter,
    load_example_leakage_config,
    permute_label)

scores = []
for seed in range(100):
    permuted_y = permute_label(y, random_state=seed)
    score = (
        # Replace with any user-defined pipeline
        MLPipeline(
            X,
            permuted_y,
            covariates=covariates,
            config=load_example_leakage_config(),
        ).fit()
    ).evaluate()
    scores.append(score)

SummaryPlotter(scores, chance_level=0.5).plot()

FAQ

Can Leakly check my own pipeline?

Yes. Leakly can evaluate any pipeline that takes X, y, optional covariates, and returns a test score. The key is to run the full pipeline exactly as in the real analysis, including preprocessing, feature selection, tuning, and evaluation.

Why can a leaky pipeline score well on permuted labels?

If leakage occurs, information from test samples can enter the analysis before the train/test split or outside the cross-validation loop. Common sources include feature selection, scaling, imputation, covariate adjustment, dimensionality reduction, or hyperparameter tuning performed on all samples.

In high-dimensional data such as omics and neuroimaging, random features can appear predictive by chance. If a pipeline can retain these spurious patterns, it may perform above chance even after labels are permuted.

How many permutations should I run?

Use 100 for a quick check. Use 1,000 or more for publication-level evidence.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leakly-0.1.2.tar.gz (257.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leakly-0.1.2-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file leakly-0.1.2.tar.gz.

File metadata

  • Download URL: leakly-0.1.2.tar.gz
  • Upload date:
  • Size: 257.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for leakly-0.1.2.tar.gz
Algorithm Hash digest
SHA256 70cc5f0eed490ad11ab512567122e9a3e69dc6d25e95674a3a2c9c9f400b4f1b
MD5 662e72139b9b6e9c0d3361664f558e8c
BLAKE2b-256 519abf276b2a259daccef8266f04f286abbd062b955906ce0b4f9afbffb00d1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for leakly-0.1.2.tar.gz:

Publisher: publish.yml on DeMONLab-BioFINDER/Leakly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file leakly-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: leakly-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for leakly-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3c9d341ca1c33460d2d030f0d816f903473aa2220fef3f283f83eba65325c66b
MD5 de5572673f5e4fb05f9bf9f33cb8c3ab
BLAKE2b-256 2928e4515ab6eb6ff18e9cb7fffeacf11491c9ea0091b4fa2c07d3a65bea0228

See more details on using hashes here.

Provenance

The following attestation bundles were made for leakly-0.1.2-py3-none-any.whl:

Publisher: publish.yml on DeMONLab-BioFINDER/Leakly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page