Skip to main content

A tiny framework to perform adversarial validation of your training and test data.

Project description

adversarial-validation

PyPI PyPI - Python Version codecov Code style: black GitHub Workflow Status Documentation Status PyPI - Wheel

A tiny framework to perform adversarial validation of your training and test data.

What is adversarial validation? A common workflow in machine learning projects (especially in Kaggle competitions) is:

  1. train your ML model in a training dataset.
  2. tune and validate your ML model in a validation dataset (typically is a discrete fraction of the training dataset).
  3. finally, assess the actual generalization ability of your ML model in a “held-out” test dataset.

This strategy is widely accepted, but it heavily relies on the assumption that the training and test datasets are drawn from the same underlying distribution. This is often referred to as the “identically distributed” property in the literature.

This package helps you easily assert whether the "identically distributed" property holds true for your training and test datasets or equivalently whether your validation dataset is a good proxy for your model's performance on the unseen test instances.

If you are a person of details, feel free to take a deep dive to the following companion article:

adversarial validation: can i trust my validation dataset?

Install

The recommended installation is via pip:

pip install advertion

(advertion stands for adversarial validation)

Usage

from advertion import validate

train = pd.read_csv("...")
test = pd.read_csv("...")

validate(
    trainset=train,
    testset=test,
    target="label",
)

# // {
# //     "datasets_follow_same_distribution": True,
# //     'mean_roc_auc': 0.5021320833333334,
# //     "adversarial_features': ['id'],
# // }

How to contribute

If you wish to contribute, this is a great place to start!

License

Distributed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

advertion-1.0.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

advertion-1.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file advertion-1.0.1.tar.gz.

File metadata

  • Download URL: advertion-1.0.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.15 Linux/6.8.0-1017-azure

File hashes

Hashes for advertion-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d9cd3b7d0f806cba4584e45423210b98461b743d9b0d054e80a8d99f2adafd3d
MD5 d1ef408ec7bac92da86a962690e74b7e
BLAKE2b-256 1f4eb446ce5a6752c7f1446c52c77ccd4d898ec662a9962c2cfb412e85e82628

See more details on using hashes here.

File details

Details for the file advertion-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: advertion-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.15 Linux/6.8.0-1017-azure

File hashes

Hashes for advertion-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 464721066c3ce9d95b2597282b93449399f62951f8daec81da79704943ba67ee
MD5 d7a6857495935c64b42e47e31d06604e
BLAKE2b-256 806eb1c784ad3b50cf80b299c1d6d2f6187183e58e96a62593907ee8b0701981

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page