Skip to main content

Automating Assumption Checks for Regression Models

Project description

PyAssume

Automating Assumption Checks for Regression Models

PyPI

FeaturesDownloadUsageMotivationContributingUpcoming

Features

PyAssume automates the assumption checks of regression models (e.g., linear and logistic regression) on your data and displays the results in an elegant dashboard. 

This lets you easily verify whether regression modeling is justified and if the model output can be interpreted correctly.

GIF coming soon!

  • Automatically detects regression task (and relevant assumption checks) based on target variable.

  • Automatically executes statistical tests and visual plots for all relevant assumption checks.

  • Generates clear visual output of results in a beautiful dashboard (built on Jupyter-Dash).

  • Automatically one-hot encodes categorical variables for successful regression modelling (unless manually specified otherwise).

  • Displays insightful information on assumption concepts and (possible) solutions to violations.

Download

pip install pyassume

Usage

Quickstart

from pyassume import Check
from pyassume.datasets import load_data

df = load_data('Fish_processed')  # Get toy dataset (pre-processed)

assume = Check(df, target='Weight')  # Initiate Check class and define target variable
assume.report()  # Run assumption checks and generate dashboard report

Note: Dataset should ideally be pre-processed before running assumption checks.

Comprehensive Usage

  • While pre-processing should ideally be performed prior, PyAssume comes with automatic encoding of categorical variables so that we can quickly commence model runs and assumption checks
  • Here's how to put the Check class (core object of PyAssume) to its best use:
df = load_data('Fish')  # Get toy dataset (raw)

assume = Check(df=df, 
               target='Weight',
               task='linear regression',
               predictors=['Height', 'Width', 'Length1', 'Species'],
               keep=True,
               categorical_features=['Species'],
               categorical_encoder='ohe',
               mode='inline')

Attributes

  • df: pd.DataFrame
    Dataset (in pandas DataFrame format)

  • target: str
    Column name of target (dependent) variable

  • task: str
    Type of regression task to be performed. Options include: 'linear regression'(More tasks to come soon). If None specified, task will be automatically determined based on target variable.

  • predictors: list
    List of column names of predictor (independent) features. If None specified, all columns other than target will be regarded as predictors

  • keep: bool
    If True, variables in predictors list will be kept as predictor variables, and other non-target variables will be dropped. If False, variables in predictors list will be dropped, and other non-target variables will be retained. Default is True.

  • categorical_features: list
    List of column names deemed categorical, so that appropriate encoding can be performed. If None specified, the categorical variables will be automatically detected and encoded into numerical format for regression modelling. Default is None.

  • categorical_encoding: str
    Type of encoding technique to be performed on categorical variables. Options include: ohe (i.e. one-hot encoding) and ord (i.e. ordinal encoding). Default is ohe.

  • mode: str
    Type of display for dashboard report. Options include inline (displayed as output directly in Jupyter notebook), external (displayed in a new full-screen browser tab), or jupyterlab (displayed in separate tab right inside JupyterLab). Default is inline.

Notes

  • Only df and target attributes are compulsory

Motivation

  • Tedious to perform assumption checks manually
  • Lack of rigour and consistency in references and notebooks online

Contributing

  1. Have a look at the existing Issues and Pull Requests that you would like to help with.
  2. Clone repo and create a new branch: $ git checkout https://github.com/kennethleungty/pyassume -b name_of_new_branch.
  3. Make changes and test
  4. Submit Pull Request with comprehensive description of changes

If you would like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.

See full contribution guide →

Upcoming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statsassume-0.0.1.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statsassume-0.0.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file statsassume-0.0.1.tar.gz.

File metadata

  • Download URL: statsassume-0.0.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for statsassume-0.0.1.tar.gz
Algorithm Hash digest
SHA256 34d747602d34ba71dc10846ddfb6535fe26413413a62ea92437088970217eefc
MD5 5b5887784850c6b7313fae0bca2a9b60
BLAKE2b-256 ed661008aaf0a5c6a5e5c9c06715c98f75f0b3988b3f5371d2bc17b3fd232a4c

See more details on using hashes here.

File details

Details for the file statsassume-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: statsassume-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for statsassume-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d527436edd1a844f953e6b79034a3ccfbee052ae335685902d1bffb97b8b387c
MD5 65a1c3fd3149dd5ca47209fef3384674
BLAKE2b-256 57647fa2359cb65565d1aca2b637f751f49b2a9237587ba5256ac6149aa4ddb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page