Skip to main content

A tiny framework to perform adversarial validation of your training and test data.

Project description

adversarial-validation

PyPI PyPI - Python Version codecov Code style: black GitHub Workflow Status Documentation Status PyPI - Wheel

A tiny framework to perform adversarial validation of your training and test data.

What is adversarial validation? A common workflow in machine learning projects (especially in Kaggle competitions) is:

  1. train your ML model in a training dataset.
  2. tune and validate your ML model in a validation dataset (typically is a discrete fraction of the training dataset).
  3. finally, assess the actual generalization ability of your ML model in a “held-out” test dataset.

This strategy is widely accepted, but it heavily relies on the assumption that the training and test datasets are drawn from the same underlying distribution. This is often referred to as the “identically distributed” property in the literature.

This package helps you easily assert whether the "identically distributed" property holds true for your training and test datasets or equivalently whether your validation dataset is a good proxy for your model's performance on the unseen test instances.

If you are a person of details, feel free to take a deep dive to the following companion article:

adversarial validation: can i trust my validation dataset?

Install

The recommended installation is via pip:

pip install advertion

(advertion stands for adversarial validation)

Usage

from advertion import validate

train = pd.read_csv("...")
test = pd.read_csv("...")

validate(
    trainset=train,
    testset=test,
    target="label",
)

# // {
# //     "datasets_follow_same_distribution": True,
# //     'mean_roc_auc': 0.5021320833333334,
# //     "adversarial_features': ['id'],
# // }

How to contribute

If you wish to contribute, this is a great place to start!

License

Distributed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

advertion-0.1.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

advertion-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file advertion-0.1.1.tar.gz.

File metadata

  • Download URL: advertion-0.1.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for advertion-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a6322d1e60ae9c17e461cd3bdcc521ce942e7de7719429b47174fa197ec97682
MD5 40a6451a0d387a0a20dfd5add0f98321
BLAKE2b-256 1ef5dcb45811c91d9d150fa09a20e67bdf70932d4d40bb51cf78a00788b64efb

See more details on using hashes here.

File details

Details for the file advertion-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: advertion-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for advertion-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3f8e7d24b750c0d9974477a95765f42d01212364a6d46eba58b1adf9997e433
MD5 6ad324583b672b09113113c182f21c9d
BLAKE2b-256 e846a85e0832b71910df75fb8544c163758e6d88cf67c83e83b90522b43319c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page