A rigorous Python toolkit for predictive ML — feature analysis, interaction testing, and model robustness validation

These details have not been verified by PyPI

Project links

Homepage

Project description

predykt

A Python toolkit for rigorous feature interaction analysis in machine learning models. Combines cyclical optimal binning, SHAP interaction stability testing, residual representation testing, and seed robustness validation into a unified workflow for credit risk and tabular ML.

Why predykt?

Standard ML libraries treat feature analysis as a single-pass operation: fit once, read SHAP values, done. This works poorly when:

Your temporal features are cyclical (hour-of-day, month-of-year): standard binners don't know that 23:00 and 00:00 are adjacent
Your SHAP interactions are seed-dependent: a strong-looking interaction from a single fit may vanish on the next random seed
Your HPO result is lucky: the best config from your tuning run may only be best for that seed
Your engineered feature needs validation: a candidate transformation may not explain residual structure the base model missed

predykt addresses each of these failure modes with dedicated, statistically grounded tools.

Installation

pip install predykt

Dependencies: numpy, numba, scikit-learn, pandas, shap, scipy, statsmodels, optbinning, matplotlib, seaborn, joblib, tqdm

Modules

Module	What it does
`CyclicalBinner`	IV-maximizing optimal binning for circular temporal features
`InteractionTester`	SHAP interaction stability testing via refit-across-seeds
`InteractionVoter`	Cross-algorithm voting to distinguish data interactions from algorithm artifacts
`SeedRobustnessValidator`	Statistical validation of hyperparameter config robustness across seeds
`FeatureBinningAnalyzer`	IV uplift screening for feature pair interactions via OptBinning
`ResidualRepresentationTester`	Residual-based test of whether an engineered representation explains base-model residuals
`SHAPInteractionAnalyzer`	Three-layer SHAP attribution corrected for collinearity and cross-group aliasing

Quick Start

1. Cyclical Optimal Binning

Standard binners treat hour 23 and hour 0 as maximally distant. CyclicalBinner treats the domain as circular and finds the IV-maximizing partition accordingly.

import numpy as np
from predykt import CyclicalBinner

# Simulate hour-of-day data with a fraud spike at night (22:00-02:00)
rng = np.random.default_rng(42)
n = 10_000
hours = rng.integers(0, 24, size=n)
fraud_prob = np.where((hours >= 22) | (hours <= 2), 0.15, 0.04)
y = rng.binomial(1, fraud_prob)

binner = CyclicalBinner(m=24, gamma=0.02, k_max=6)
binner.fit(hours, y)

print(f"Optimal bins: {binner.n_bins_}")
print(f"Split points: {binner.split_points_}")
print(f"IV: {binner.iv_:.4f}")
print(binner.result_.summary())

Output (example):

Optimal bins: 3
Split points: [ 3  8 22]
IV: 0.1823

   bin       range  count  count_%  events  non_events  event_rate       woe        iv
0    0    [3, 8)   2082    20.82      83        1999    0.039878 -0.3421    0.011
1    1   [8, 22)  5831    58.31     227        5604    0.038932 -0.3665    0.038
2    2  [22, 3)*  2087    20.87     312        1775    0.149496  1.1803    0.133
...

Transform to WOE for scorecard:

# Bin index
binned = binner.transform(hours)

# WOE directly (for logistic regression scorecards)
woe_encoded = binner.transform_woe(hours)

# WOE lookup table for documentation
woe_table = binner.result_.woe_table()

2. SHAP Interaction Stability Testing

A SHAP interaction from a single model fit conflates a real data relationship with random seed luck. InteractionTester refits the model across N seeds and measures whether the interaction is a stable property.

import pandas as pd
from xgboost import XGBClassifier
from predykt import InteractionTester

tester = InteractionTester(
    model_class=XGBClassifier,
    base_params={
        "n_estimators": 200,
        "max_depth": 5,
        "eval_metric": "logloss",
        "verbosity": 0,
    },
    seed_param="random_state",
    n_seeds=200,
    alpha=0.05,
    n_jobs=4,
)

# Step 1: cheap single-seed screen to identify candidate pairs
top_pairs = tester.get_top_n_interactions(X, y, n=10)

# Step 2: full stability test across 200 seeds
results = tester.test_pairs(X, y, top_pairs)

# Step 3: results with optional BH multiple testing correction
df = tester.results_to_dataframe(results, correction_method="fdr_bh")
print(df[["Feature_i", "Feature_j", "Instability_Score", "Per_Interaction_AUC", "Robust"]])

What instability_score means:

Proportion of seeds where the interaction's sign opposes the majority direction (doubled for two-sidedness)
Range [0, 1]. Lower = more stable.
This is not a frequentist p-value. It measures algorithmic stability, not statistical significance under a null derived from the data-generating process.

# Visualize seed distribution for a specific pair
tester.plot_interaction_distribution(results[0])

# Check if 200 seeds was enough for convergence
tester.plot_convergence(results[0])

3. Cross-Algorithm Voting

An interaction that is stable within XGBoost may be an artifact of gradient boosting's splitting strategy, not a property of the data. InteractionVoter runs the same stability test across multiple algorithm families and tallies votes.

from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.ensemble import RandomForestClassifier
from predykt import InteractionVoter

configs = {
    "rf": {
        "model_class": RandomForestClassifier,
        "params": {"n_estimators": 200, "max_depth": 5, "n_jobs": -1},
        "seed_param": "random_state",
    },
    "xgb": {
        "model_class": XGBClassifier,
        "params": {"n_estimators": 200, "max_depth": 5, "eval_metric": "logloss", "verbosity": 0},
        "seed_param": "random_state",
    },
    "lgbm": {
        "model_class": LGBMClassifier,
        "params": {"n_estimators": 200, "max_depth": 5, "verbose": -1},
        "seed_param": "random_state",
    },
}

voter = InteractionVoter(configs, n_seeds=200, alpha=0.05, n_jobs=4)
vote_results = voter.vote(X, y, top_pairs)

# Summary table
summary = voter.summary(vote_results)
print(summary[["Feature_i", "Feature_j", "Votes", "Vote_Ratio", "Unanimous", "Mean_AUC"]])

# Heatmap: AUC by algorithm, * marks robust interactions
voter.plot_vote_heatmap(vote_results)

Unanimous interactions (all algorithms agree) are the most reliable candidates for feature engineering or regulatory documentation.

4. Seed Robustness Validation

HPO typically fixes a random seed during search, which means the "best" configuration may only be best for that initialization. SeedRobustnessValidator re-evaluates a fixed HP config across N seeds and runs formal statistical tests.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from predykt import SeedRobustnessValidator

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=999)

# The HP config you want to validate (found via HPO)
hp_config = {"n_estimators": 200, "max_depth": 6, "min_samples_split": 10}

def eval_fn(seed: int) -> float:
    clf = RandomForestClassifier(**hp_config, random_state=seed, n_jobs=-1)
    clf.fit(X_train, y_train)
    return roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])

validator = SeedRobustnessValidator(
    eval_fn=eval_fn,
    n_seeds=100,
    metric_name="AUC",
    higher_is_better=True,
    sigma_max=0.005,  # domain-informed: 0.5% AUC std is acceptable for production
)

report = validator.run()
validator.print_report(report)
validator.plot_diagnostics(report)

Statistical tests applied:

Test	Purpose
Shapiro-Wilk	Gates parametric vs bootstrap path
Chi-square variance test (one-sided upper)	H0: sigma^2 <= sigma^2_max
95/95 Tolerance interval	95% confidence that 95% of future seed runs fall within [L, U]
Bootstrap CI for std	Non-parametric fallback when normality is violated
Coefficient of Variation	Relative dispersion summary

Verdict categories: ROBUST / MARGINAL / UNSTABLE

Note on sigma_max: If not set, defaults to 1% of the observed mean, a conservative auto-default. You should override this with a domain-informed threshold. In credit scoring, 0.5% AUC std (sigma_max=0.005) is a reasonable production stability requirement.

5. Feature Binning IV Uplift

Quick screening for feature pair interactions using OptBinning's 2D binning. The uplift heuristic (IV_2D - (IV_1 + IV_2)) identifies pairs where joint information exceeds the sum of marginal information, a signal worth investigating further.

from predykt import FeatureBinningAnalyzer

analyzer = FeatureBinningAnalyzer(X, y)

feature_pairs = [
    ("age", "income"),
    ("loan_amount", "tenure"),
    ("utilization_rate", "delinquencies"),
]

results = analyzer.analyze_feature_combinations(feature_pairs)
print(analyzer.get_top_combinations())

# Inspect 2D binning table for a specific pair
table = analyzer.get_binning_details("age", "income")
print(table)

Interpretation note: The IV uplift measure is a screening heuristic, not a formal interaction test. Pairs with high uplift are candidates for the more rigorous InteractionTester / InteractionVoter pipeline.

6. Residual Representation Testing

After confirming an interaction pair is stable, ResidualRepresentationTester answers: does a specific engineered transformation of that pair explain structure the base model missed?

The test uses the Frisch-Waugh-Lovell theorem. Stage 1 computes out-of-fold residuals Ỹ = y - p̂ via K-fold cross-fitting. Stage 2 regresses Ỹ on the candidate feature Tk and tests H0: beta1 = 0. A significant result means Tk captures signal the base model failed to learn.

import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from predykt import ResidualRepresentationTester, OLSEstimator, HSICEstimator

# Candidate representations of the (age, income) interaction
reps = pd.DataFrame({
    "product":   X["age"] * X["income"],
    "ratio":     X["age"] / (X["income"] + 1),
    "log_ratio": np.log1p(X["age"]) - np.log1p(X["income"]),
})

tester = ResidualRepresentationTester(
    model=GradientBoostingClassifier(n_estimators=200, random_state=42),
    criterion=[OLSEstimator(), HSICEstimator(n_permutations=500)],
    n_folds=5,
    alpha=0.05,
)

tester.fit(
    feature_pairs=[("age", "income")],
    X=X,
    y=y,
    representations=reps,
)

# Summary table: beta, t-stat, p-value, BH-corrected p-value, winner flag
print(tester.results_to_dataframe())

# Best representation per pair
winners = tester.winning_representations()

# Add placebo and bootstrap refutation checks
tester.refute(n_permutations=100, n_bootstrap=50)
print(tester.results_to_dataframe()[["representation", "rejected", "robust"]])

Criteria:

Criterion	What it tests
`OLSEstimator`	Linear association (HC3 robust SE, handles heteroskedastic residuals)
`HSICEstimator`	Nonlinear / non-monotone dependence (kernel-based, permutation p-value)
`CustomEstimator`	Any user-supplied callable returning a `Stage2Result`

Precomputed residuals (Mode B): If you already have OOF residuals from a prior run, pass Y_resid= directly to skip the Stage 1 cross-fitting.

tester.fit(
    feature_pairs=[("age", "income")],
    X=X, y=y,
    representations=reps,
    Y_resid=precomputed_residuals,
)

7. SHAP Interaction Analyzer

When a model contains both raw features and engineered interactions, raw SHAP values are aliased by collinearity. SHAPInteractionAnalyzer provides three progressively purer attribution layers.

from predykt import SHAPInteractionAnalyzer

groups = {
    "demographics": ["age", "income"],
    "credit":       ["utilization_rate", "delinquencies", "loan_amount"],
    "temporal":     ["tenure", "hour_bin", "month"],
}

analyzer = SHAPInteractionAnalyzer(interaction_groups=groups, layers=[1, 2, 3])
analyzer.fit(model=fitted_model, X=X_test)

# Layer 1: group total SHAP (sum within group)
l1 = analyzer.layer_1_group_total()

# Layer 2: net group effects (Layer 1 minus cross-group interaction contributions)
l2 = analyzer.layer_2_net_group_effects()

# Layer 3: pure main effects per feature (diagonal of shap_interaction_values)
l3 = analyzer.layer_3_pure_main_effects()

# Global importance summary for each layer
print(analyzer.summary(layer=1))
print(analyzer.summary(layer=2))

# Side-by-side comparison: layer_1 vs layer_2 per group
group_comparison, feature_effects = analyzer.compare_layers()
print(group_comparison)

Reading the layers:

Comparison	What it tells you
Layer 1 - Layer 2 per group	How much of the group's apparent importance comes from cross-group interactions
Layer 2 - sum(Layer 3 within group)	Within-group collinearity aliasing even after cross-group correction

Design Decisions

Why refit across seeds instead of permuting on a fixed model? Permutation tests on a fixed model test whether the interaction is non-zero for that fit. Refitting tests whether the interaction is a stable property of the model family on this data, which is what matters for deployment. See the InteractionTester docstring for the full discussion.

Why Numba for CyclicalBinner? Exhaustive enumeration of all k-partitions of a circular domain of cardinality m is O(C(m, k)) per k. For m=24, k=6, that's C(24,6) = 134,596 partitions. Numba JIT brings this from seconds to milliseconds.

Why the 95/95 tolerance interval in SeedRobustnessValidator? A confidence interval on the mean tells you where the average seed lands. A tolerance interval tells you where individual seed runs land, which is what matters when you're deploying a model trained on a single seed. The 95/95 interval is the ISO 16269-6 standard for this use case.

Why HC3 robust standard errors in OLSEstimator? For binary targets, residuals Ỹ = y - p̂ have observation-specific variance p̂(1-p̂). OLS with homoskedastic standard errors is misspecified. HC3 (MacKinnon & White 1985) corrects this and is the mandatory default.

Why HSIC alongside OLS? OLS only detects linear association. HSIC (Hilbert-Schmidt Independence Criterion) is a kernel-based nonparametric test that detects any dependence structure, including nonlinear and non-monotone relationships. Running both gives a more complete picture of whether a representation carries signal.

License

MIT

Citation

If you use predykt in research or production systems, please cite the repository.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.2

Apr 30, 2026

0.1.1

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predykt-0.1.2.tar.gz (44.9 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

predykt-0.1.2-py3-none-any.whl (42.6 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file predykt-0.1.2.tar.gz.

File metadata

Download URL: predykt-0.1.2.tar.gz
Upload date: Apr 30, 2026
Size: 44.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for predykt-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3248ed61d4a5e5a030b3f277a80e7301bb62a89f43816c5c2b77bf4710d4b73a`
MD5	`fe19dcb3848f1d85a16edd1a427d2631`
BLAKE2b-256	`b6c25af0d36b6389c760da1044435e474733227c6fcb12195c35a29d377d3aeb`

See more details on using hashes here.

File details

Details for the file predykt-0.1.2-py3-none-any.whl.

File metadata

Download URL: predykt-0.1.2-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for predykt-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b907a7d41f94b8bbcfe201b68106e53e4c3db688467d7016aff8b09b0cf9399`
MD5	`6f35fee032922003cc8b6413559da9c2`
BLAKE2b-256	`401e00a92947cdaf18339a31af6989a86bd4c753253eafb64f3cc716263c68f3`

See more details on using hashes here.

predykt 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

predykt

Why predykt?

Installation

Modules

Quick Start

1. Cyclical Optimal Binning

2. SHAP Interaction Stability Testing

3. Cross-Algorithm Voting

4. Seed Robustness Validation

5. Feature Binning IV Uplift

6. Residual Representation Testing

7. SHAP Interaction Analyzer

Design Decisions

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes