Reject inference for credit scoring — 8 scikit-learn-compatible methods plus an honest benchmark that tells you whether it actually helps.

These details have not been verified by PyPI

Project links

Project description

rejectkit

Reject inference for credit scoring — scikit-learn-compatible methods, plus an honest benchmark that tells you whether reject inference actually helps on your data.

python license status

한국어 요약 (Korean)

신용 모델은 승인된 신청자(나중에 good/bad 결과를 아는 사람)로만 학습하지만, 실제로는 거절자를 포함한 전체 신청자를 평가해야 한다 — 이 표본 선택 편향을 바로잡는 기법이 reject inference다. rejectkit은 이 고전 기법 8가지를 scikit-learn 스타일의 한 API로 묶고, "그 보정이 내 데이터에서 실제로 도움이 되는지" 재는 벤치마크(MaskedRejectBenchmark)까지 제공한다. 입력은 pandas·polars·numpy 모두 지원. 자세한 한·영·일 설명은 docs/explainer.md, 실데이터 예제는 examples/real_data_home_credit.ipynb 참고.

日本語要約 (Japanese)

与信モデルは承認された申込者（後で good/bad の結果が分かる人）だけで学習するが、実際には否認者を含む全申込者を評価しなければならない — この標本選択バイアスを補正する手法が reject inference。rejectkit はこの古典的手法8種を scikit-learn 風の単一 API にまとめ、**「その補正が自分のデータで実際に役立つか」**を測るベンチマーク（MaskedRejectBenchmark）まで備える。入力は pandas・polars・numpy に対応。詳しい3言語解説は docs/explainer.md、実データ例は examples/real_data_home_credit.ipynb を参照。

Why this exists

A credit model is trained on accepted applicants, whose good/bad outcome you eventually observe. But the model has to score the whole through-the-door population — including the applicants you rejected, who never get an outcome. Training on accepts only is a textbook case of sample-selection bias. Reject inference is the family of techniques that tries to correct it.

These methods are standard in the credit-risk world, yet the Python tooling is missing:

R has scoringTools (augmentation, fuzzy_augmentation, parcelling, reclassification, twins) — GitHub only.
Python scorecard libraries — scorecardpy, optbinning, scorecardbundle — do WOE/IV binning and logistic scorecards but skip reject inference entirely.
What's left online is one-off research code, not a packaged, tested library.

rejectkit fills that gap: eight reject inference methods behind one scikit-learn-style API, a benchmark harness even scoringTools lacks, plus drift diagnostics and plotting.

Install

pip install -e .              # core: numpy, pandas, scikit-learn
pip install -e ".[plot]"      # + matplotlib plotting helpers
pip install -e ".[polars]"    # + polars input support

Quickstart

from sklearn.linear_model import LogisticRegression
from rejectkit import RejectInferenceClassifier

# X_accept, y_accept: accepted applicants and their good(0)/bad(1) outcomes
# X_reject:           rejected applicants — features only, no labels
clf = RejectInferenceClassifier(
    estimator=LogisticRegression(max_iter=1000),
    method="parcelling",
    method_params={"uplift": 1.3},   # assume rejects are ~30% worse per score band
)
clf.fit(X_accept, y_accept, X_reject)
pd_bad = clf.predict_proba(X_new)[:, 1]

Just want the augmented training sample for your own pipeline?

from rejectkit import FuzzyAugmentation

X_aug, y_aug, sample_weight = (
    FuzzyAugmentation(LogisticRegression(max_iter=1000))
    .fit_resample(X_accept, y_accept, X_reject)
)

Inputs may be pandas, polars, or numpy.

Methods

Method	Class	Core idea	Assumption
Simple augmentation	`SimpleAugmentation`	Hard 0/1 label by score cutoff	Accept model ranks rejects
Fuzzy augmentation	`FuzzyAugmentation`	Two weighted rows per reject (P(bad), P(good))	MAR; smooth labels
Parcelling	`Parcelling`	Per-score-band bad rate × `uplift`	Rejects worse by a fixed factor
Reclassification	`Reclassification`	Iteratively relabel & refit	Labels converge
Extrapolation / twins	`Extrapolation`	Local bad rate of nearest accepts	Similar applicants behave alike
Inverse-propensity reweighting	`Reweighting`	Reweight accepts by `1/P(accept)`	MAR; invents no labels
Self-training	`SelfLearning`	Pseudo-label only confident rejects	MAR; confident labels reliable
Heckman control function	`HeckmanClassifier`	Add inverse Mills ratio as a feature	Gaussian selection latent

All resamplers share fit(X_accept, y_accept, X_reject) → resample() returning (X, y, sample_weight). HeckmanClassifier augments the feature space, so it is a standalone classifier rather than a resampler.

Does reject inference actually help? Measure it.

You can never validate reject inference directly, because rejects have no outcome — the literature is genuinely split on whether it helps at all. MaskedRejectBenchmark settles the question on your own data: it hides the labels of a synthetically "rejected" subset of a labelled dataset and checks how well each method recovers a model close to the oracle (trained on the full population) versus the naive accepts-only baseline.

from rejectkit import MaskedRejectBenchmark
from rejectkit.datasets import make_credit_data

X, y = make_credit_data(n_samples=4000, random_state=0)
bench = MaskedRejectBenchmark(selection="mnar", accept_rate=0.6, random_state=0)
print(bench.compare(
    ["fuzzy", "parcelling", "reweighting", "extrapolation", "selflearning", "heckman"],
    X, y,
).round(4))

                     auc      ks    gini  auc_recovery
oracle            0.8203  0.4911  0.6406        1.0000
naive             0.7488  0.3651  0.4975        0.0000
fuzzy             0.7488  0.3663  0.4977        0.0010
parcelling        0.7404  0.3468  0.4809       -0.1161
reweighting       0.7249  0.3290  0.4498       -0.3334
extrapolation     0.6989  0.2889  0.3977       -0.6973
selflearning      0.7124  0.3093  0.4248       -0.5080
heckman           0.7457  0.3559  0.4914       -0.0424

auc_recovery: 0 = no better than the naive accepts-only model, 1 = matches the full-data oracle.

Read this honestly. Selection here is MNAR (acceptance depends on the hidden outcome), so naive is badly biased (0.749 vs the 0.820 oracle) — yet the augmentation methods barely move it and several hurt; only Heckman nearly holds the naive line. That is what theory predicts when selection depends on the outcome: reject inference is not a free lunch. Switch to selection="mar" or selection="cutoff" and the verdict often flips the other way — frequently the naive model is already at the oracle, so auc_recovery returns NaN (no gap to recover) and reject inference is simply unnecessary. The harness exists so you find out before you ship it.

Selection mechanisms: "mar" (features only), "mnar" (features + hidden outcome), "cutoff" (accept the lowest-PD fraction — a realistic credit policy).

Diagnostics & plotting

from rejectkit.diagnostics import feature_drift, swap_set, psi
feature_drift(X_accept, X_reject)        # per-feature accept-vs-reject PSI, worst first
swap_set(y, score_old, score_new, c_old, c_new)   # who a new scorecard swaps in/out

from rejectkit import plotting            # needs [plot]
plotting.plot_benchmark(results)
plotting.plot_score_distributions(score_accept, score_reject)
plotting.plot_ks(y_true, y_score)

Caveats

Augmentation methods infer reject labels from a model fitted on the (biased) accepts, so they cannot escape strong MNAR selection on their own.
Reject inference often affects calibration more than ranking (AUC). Evaluate the metric you care about.
Always benchmark before adopting. rejectkit makes that one function call.

Documentation

Build the docs site locally:

pip install -e ".[docs]"
mkdocs serve

Examples

examples/quickstart.py — 60-second tour (single model + benchmark).
examples/walkthrough.ipynb — every function on sample data (trilingual KO/EN/JA, executed).
examples/real_data_home_credit.ipynb — applied to the real Kaggle Home Credit dataset: under MNAR selection the naive model collapses (AUC 0.74 → 0.57) and reject inference recovers ~7–8% of the gap; under MAR/cutoff it is unnecessary (trilingual, executed).

Roadmap

v0.1 — core augmentation/parcelling/reweighting, RejectInferenceClassifier, benchmark. ✅
v0.2 — reclassification, extrapolation / twins. ✅
v0.3 — self-training, Heckman, polars, plotting, drift diagnostics, docs. ✅
Next — calibration-focused benchmark metrics, deep generative reject inference (optional extra), PyPI release.

References

Hand & Henley (1993), Can reject inference ever work?
Crook & Banasik (2004), Does reject inference really improve the performance of application scoring models?
Lopes, Should we "reject" Reject Inference? An empirical study.
scoringTools (R): https://github.com/adimajo/scoringTools

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rejectkit-0.3.0.tar.gz (284.6 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rejectkit-0.3.0-py3-none-any.whl (26.2 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file rejectkit-0.3.0.tar.gz.

File metadata

Download URL: rejectkit-0.3.0.tar.gz
Upload date: Jun 27, 2026
Size: 284.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for rejectkit-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b87711e67a326b639111e7c7d18b8d38c37bce4f4c282cf448b0ace2f53c95dc`
MD5	`66de0a95fa98a6cfe3edeeae6f80069c`
BLAKE2b-256	`0b0969bbb5dc3c295501e45efe6fbb01a3f8b5be1c41f315c823328fb141107d`

See more details on using hashes here.

File details

Details for the file rejectkit-0.3.0-py3-none-any.whl.

File metadata

Download URL: rejectkit-0.3.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for rejectkit-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7c72cdd25ada2383ce175afdb8f371456c4609e7cb4204fa191c90c5f1ae53c`
MD5	`7c710b8aae5070c18702f383b30eaa6e`
BLAKE2b-256	`99ac93f495da8530215f897d1471cb708de45b296a8b9e3b611ed7e4dd933534`

See more details on using hashes here.

rejectkit 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rejectkit

Why this exists

Install

Quickstart

Methods

Does reject inference actually help? Measure it.

Diagnostics & plotting

Caveats

Documentation

Examples

Roadmap

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes