Leakage checks for machine-learning pipelines using permutation tests.
Project description
Leakly: Leakage checks for any machine-learning pipeline
Leakly uses label permutation to test whether a machine-learning pipeline performs above chance when no true signal is present.
Above-chance performance after permutation may indicate leakage from preprocessing, feature selection, tuning, or another step of the pipeline.
How it works
- Permute labels to remove the real feature-label association.
- Run the full pipeline exactly as in the original analysis.
- Compare the permuted score distribution with chance level.
- Above-chance permuted performance suggests possible leakage.
Leakly includes example configurations for a leaky pipeline and a non-leaky pipeline so users can inspect the effect directly.
Install
pip install Leakly
Quick Start on Colab: 
Key Python snippet
from leakly import (
MLPipeline,
SummaryPlotter,
load_example_leakage_config,
permute_label)
scores = []
for seed in range(100):
permuted_y = permute_label(y, random_state=seed)
score = (
# Replace with any user-defined pipeline
MLPipeline(
X,
permuted_y,
covariates=covariates,
config=load_example_leakage_config(),
).fit()
).evaluate()
scores.append(score)
SummaryPlotter(scores, chance_level=0.5).plot()
FAQ
Can Leakly check my own pipeline?
Yes. Leakly can evaluate any pipeline that takes X, y,
optional covariates, and returns a test score.
The key is to run the full pipeline exactly as in the real analysis,
including preprocessing, feature selection, tuning, and evaluation.
Why can a leaky pipeline score well on permuted labels?
If leakage occurs, information from test samples can enter the analysis before the train/test split or outside the cross-validation loop. Common sources include feature selection, scaling, imputation, covariate adjustment, dimensionality reduction, or hyperparameter tuning performed on all samples.
In high-dimensional data such as omics and neuroimaging, random features can appear predictive by chance. If a pipeline can retain these spurious patterns, it may perform above chance even after labels are permuted.
How many permutations should I run?
Use 100 for a quick check. Use 1,000 or more for publication-level evidence.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leakly-0.1.2.tar.gz.
File metadata
- Download URL: leakly-0.1.2.tar.gz
- Upload date:
- Size: 257.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70cc5f0eed490ad11ab512567122e9a3e69dc6d25e95674a3a2c9c9f400b4f1b
|
|
| MD5 |
662e72139b9b6e9c0d3361664f558e8c
|
|
| BLAKE2b-256 |
519abf276b2a259daccef8266f04f286abbd062b955906ce0b4f9afbffb00d1d
|
Provenance
The following attestation bundles were made for leakly-0.1.2.tar.gz:
Publisher:
publish.yml on DeMONLab-BioFINDER/Leakly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
leakly-0.1.2.tar.gz -
Subject digest:
70cc5f0eed490ad11ab512567122e9a3e69dc6d25e95674a3a2c9c9f400b4f1b - Sigstore transparency entry: 1534260250
- Sigstore integration time:
-
Permalink:
DeMONLab-BioFINDER/Leakly@67140b3e790d5b1dc6564b6f0bbc373a90b166b0 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/DeMONLab-BioFINDER
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@67140b3e790d5b1dc6564b6f0bbc373a90b166b0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file leakly-0.1.2-py3-none-any.whl.
File metadata
- Download URL: leakly-0.1.2-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c9d341ca1c33460d2d030f0d816f903473aa2220fef3f283f83eba65325c66b
|
|
| MD5 |
de5572673f5e4fb05f9bf9f33cb8c3ab
|
|
| BLAKE2b-256 |
2928e4515ab6eb6ff18e9cb7fffeacf11491c9ea0091b4fa2c07d3a65bea0228
|
Provenance
The following attestation bundles were made for leakly-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on DeMONLab-BioFINDER/Leakly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
leakly-0.1.2-py3-none-any.whl -
Subject digest:
3c9d341ca1c33460d2d030f0d816f903473aa2220fef3f283f83eba65325c66b - Sigstore transparency entry: 1534260343
- Sigstore integration time:
-
Permalink:
DeMONLab-BioFINDER/Leakly@67140b3e790d5b1dc6564b6f0bbc373a90b166b0 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/DeMONLab-BioFINDER
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@67140b3e790d5b1dc6564b6f0bbc373a90b166b0 -
Trigger Event:
release
-
Statement type: