Skip to main content

Jenga is an open source experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.

Project description

Jenga

Overview

Jenga is an open source experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.

We design Jenga around three core abstractions:

  • Tasks contain a raw dataset, an ML model and a prediction task
  • Data corruptions take raw input data and randomly apply certain data errors to them (e.g., missing values)
  • Evaluators take a task and data corruptions, and execute the evaluation by repeatedly corrupting the test data of the task, and recording the predictive performance of the model on the corrupted test data.

Jenga's goal is assist data scientists with detecting such errors early, so that they can protected their models against them. We provide a jupyter notebook outlining the most basic usage of Jenga.

Note that you can implement custom tasks and data corruptions by extending the corresponding provided base classes.

We additionally provide three advanced usage examples of Jenga:

Installation

The following options are possible:

pip install jenga             # jenga is ready for the most corruptions (not images)
pip install jenga[all]        # install all dependencies, optimal for development
pip install jenga[image]      # also installs tensorflow ad image corruption/augmentation libraries
pip install jenga[validation] # also install tensorflow and tensorflow-data-validation necessary for SchemaStresstest

Research

Jenga is based on experiences and code from our ongoing research efforts:

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yaml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yaml for the exact reproduction of your environment with:
    conda env export -n jenga -f environment.lock.yaml
    
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yaml using:
    conda env update -f environment.lock.yaml --prune
    

Installation for Development

In order to set up the necessary environment:

  1. create an environment jenga with the help of conda,
    conda env create -f environment.yaml
    
  2. activate the new environment with
    conda activate jenga
    
  3. install jenga with:
    python setup.py install # or `develop`
    

Optional and needed only once after git clone:

  1. install several pre-commit git hooks with:
    pre-commit install
    
    and checkout the configuration under .pre-commit-config.yaml. The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.

Then take a look into the notebooks folder.

Note

This project has been set up using PyScaffold 3.2.2 and the dsproject extension 0.4. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jenga-0.0.1a1.tar.gz (10.0 MB view details)

Uploaded Source

Built Distribution

jenga-0.0.1a1-py2.py3-none-any.whl (33.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jenga-0.0.1a1.tar.gz.

File metadata

  • Download URL: jenga-0.0.1a1.tar.gz
  • Upload date:
  • Size: 10.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for jenga-0.0.1a1.tar.gz
Algorithm Hash digest
SHA256 a34b83ee053df4c1ffa2555c71086d0d737ecb2e8673eeb90f71bc724c48d37d
MD5 bdcf2abf5e3f056f1675af1003eb67aa
BLAKE2b-256 07814f89aec19aa4be47f41d7a7d8e20bd174ee6de8b001309349ab635537c93

See more details on using hashes here.

File details

Details for the file jenga-0.0.1a1-py2.py3-none-any.whl.

File metadata

  • Download URL: jenga-0.0.1a1-py2.py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for jenga-0.0.1a1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0d62c88ca3ca7c87613056211ca56c972eaf6a34be8a3d118e37ff17406fa71e
MD5 0e77ad560dc7c8f55752731603d993b0
BLAKE2b-256 7e916e8f421c471fb0bc5a1ae04d6d716cd8c469701307dd06b2835550c5520d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page