Statistical IV: Statistical Hypothesis Testing for the Information Value (IV). Evaluation of the predictive power of features using the IV with specific thresholds for each dataset.
Project description
Statistical IV
Our J-Divergence test is under the next null hypothesis
H0: The predictive power of the variable is not significant.
The null hypothesis is tested using a two-tailed distribution, and this should be taken into consideration when interpreting the p-value.
Explanation
Optimize your machine learning models with 'Statistical-IV'. Perform automated feature selection based on statistics and customize error control.
-
Import package
from statistical_iv import api
-
Provide a DataFrame as Input:
- Supply a DataFrame
df
containing your data for IV calculation.
- Supply a DataFrame
-
Specify Predictor Variables:
- Prived a list of predictor variable names (
variables_names
) to analyze.
- Prived a list of predictor variable names (
-
Define the Target Variable:
- Specify the name of the target variable (
var_y
) in your DataFrame.
- Specify the name of the target variable (
-
Indicate Variable Types:
- Define the type of your predictor variables as 'categorical' or 'numerical' using the
type_vars
parameter.
- Define the type of your predictor variables as 'categorical' or 'numerical' using the
-
Optional: Set Maximum Bins:
- Adjust the maximum number of bins for discretization (optional) using the
max_bins
parameter.
- Adjust the maximum number of bins for discretization (optional) using the
-
Call the
statistical_iv
Function:- Calculate Statistical IV information by calling the
statistical_iv
function from api with the specified parameters (That is used for OptimalBinning package).
result_df = api.statistical_iv(df, variables_names, var_y, type_vars, max_bins)
- Calculate Statistical IV information by calling the
Example Result:
Full Paper:
For a comprehensive exploration of the topic, we recommend perusing the contents of the article available at this link.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for statistical_iv-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fc5c1e2f27c86efb1ae21c33a51ee88395b70d8aaf4d5b5aa5698b558560376 |
|
MD5 | 8ecc9e9e3f6dd9520ab5330cf97c2203 |
|
BLAKE2b-256 | bf931674c8eb97ca9666b68e1699023cc02b695a84904fedcf739561eb24655a |