A tiny framework to perform adversarial validation of your training and test data.
Project description
adversarial-validation
A tiny framework to perform adversarial validation of your training and test data.
What is adversarial validation?
A common workflow in machine learning projects (especially in Kaggle competitions) is to:
- train your ML model in a training dataset.
- tune and validate your ML model in a validation dataset (which typically originates as a fraction of the training dataset).
- finally, assess the actual generalization ability of your ML model in a held-out test dataset.
This strategy is widely accepted, but it heavily relies on the assumption that the training and test datasets are drawn from the same underlying distribution. This is often referred to as the “identically distributed” property in the literature.
This package helps you easily assert whether the "identically distributed" property holds true for your training and test datasets or equivalently whether your validation dataset is a good proxy for your model's performance on the unseen test instances.
Install
The recommended installation is via pip
:
pip install advertion
(advertion stands for adversarial validation)
Usage
from advertion import validate
train = pd.read_csv("...")
test = pd.read_csv("...")
validate(
trainset=train,
testset=test,
target="label",
)
# // {
# // "datasets_follow_same_distribution": True,
# // 'mean_roc_auc': 0.5021320833333334,
# // "adversarial_features': ['id'],
# // }
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for advertion-0.1.0b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 373726693cf8f03c27b86ec87839567ae316a5ec8e35267c29a28d1fa313bc9a |
|
MD5 | d706513c1f356f08f6e9d5135f82c4a6 |
|
BLAKE2b-256 | 2841068ff4a5aee22faf69a468ccabe3a11a92d1b216949e1c308cb9199c72bc |