CLUE-style closed loop that measures selective-labels default detection on synthetic SMB lending cohorts and finds the PD model's operating frontier.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hossainpazooki

These details have not been verified by PyPI

Project description

CLDD — closed-loop default detection

Stress-test a probability-of-default (PD) model under selective labels — and get the severity at which it breaks. Real lending data only labels the loans a prior underwriter approved, so you cannot measure calibration on the applicants you declined — exactly where a new model must still be right. CLDD builds synthetic lending worlds with planted ground truth, hides labels the way real approval policies do, and grades every correction against that truth.

Deterministic — byte-identical per seed, scikit-learn-only, no services or GPUs.
Pluggable — correction levers (IPW, retrain, exploration, reject inference) are classes; add yours by subclassing Corrector.
Honest by construction — every number below recomputes from committed CSVs; limits are reported, not smoothed over.

The result it produces

The loop escalates selection severity until correction fails and reports the operating frontier — the last severity at which declined-cohort calibration still holds (target ECE ≤ 0.10). From the committed runs (artifacts/clue_frontier*.csv, seed 42):

Selection severity	0.0	0.2	0.4	0.6
Naive declined ECE (flat world)	0.021	0.045	0.108	0.161
IPW-corrected (flat world)	0.020	0.038	0.086 ✓	0.154 ✗
IPW-corrected (SCM world)	0.036	0.038	0.097 ✓	0.244 ✗

Both worlds land the frontier at severity 0.4, and the counterfactual deliverable breaks at the same boundary: across 25 seeds, g-computation cuts strong-propagation counterfactual MAE from 0.099 to 0.086 (−13.5%, positive on 24/25 seeds, Wilcoxon p = 1.5e-7) inside the frontier — and collapses to a negligible +0.0017 at full severity, where no deployable advantage is claimed. One cause explains both: selection through an unobserved confounder, which backdoor adjustment and IPW cannot fix. That single measured limit — not an unverifiable score — is the deliverable.

Reproduce the headline from committed evidence: python scripts/paired_significance.py. The full independent assessment (methodology, all numbers, what didn't hold) is the accompanying article, FABLE.md.

Install

pip install closed-loop-default-detection

The import name is cldd. For development (tests, docs, the committed evidence), install from source:

git clone https://github.com/hossainpazooki/closed-loop-default-detection.git
cd closed-loop-default-detection
pip install -e ".[dev]"

Python ≥ 3.10; dependencies are ranges (numpy>=2.0, pandas>=2.2, scikit-learn>=1.6, scipy>=1.11, matplotlib>=3.8) so cldd sits alongside your stack. Exact pins for float-exact reproduction: requirements-dev.txt (details).

60-second tour

from cldd import SelectiveLabelsLoop

result = SelectiveLabelsLoop(improve_mode="both").run()   # "reweight" | "retrain" | "both"
print("Operating frontier:", result.frontier_severity)
for r in result.rounds:
    print(r.selection_severity, r.naive.declined_ece, r.passed)

flowchart TD
    A["<b>1. Generate</b><br/>synthetic cohort at a given selection severity<br/>plant true default, then hide it via the approval policy"]
    B["<b>2. Measure</b><br/>train the PD model on approved rows only,<br/>score it against planted truth on the declined subpopulation"]
    C["<b>3. Improve</b><br/>apply a correction lever:<br/>IPW reweight &middot; disjoint retrain &middot; exploration"]
    D{"Corrected declined-cohort<br/>ECE &le; target?"}
    E["<b>Operating frontier</b><br/>report the highest severity<br/>that still passes"]

    A --> B --> C --> D
    D -->|"yes &mdash; raise the severity"| A
    D -->|"no &mdash; stop"| E

A runnable end-to-end demo (classic + custom-lever paths) is examples/quickstart.py. Full mechanics, diagnostics, and the feedback simulation: docs/how-it-works.md.

Scope. CLDD is a synthetic validation harness, not a production pipeline: retraining and feedback are seeded simulations inside the harness; it never acts on live data or real lending decisions.

What's in the box

Everything is importable from top-level cldd (full reference: the Sphinx docs):

Import	What it is
`SelectiveLabelsLoop`	the closed loop; `.run()` → `LoopResult` (frontier + per-round metrics)
`Corrector` + `NaiveCorrector`, `IPWReweightCorrector`, `DisjointRetrainCorrector`, `ExplorationCorrector`	the lever ABC and the four built-ins
`ReclassificationCorrector`, `AugmentationCorrector`, `FuzzyAugmentationCorrector`, `ParcellingCorrector`	four classic reject-inference methods, graded against planted truth (honest results)
`SyntheticBorrowerGenerator`, `StructuralBorrowerGenerator`	the flat and fitted-SCM synthetic worlds
`run_counterfactual_eval`, `GComputationEstimator`	counterfactual validator (g-computation vs naive conditioning)
`FeedbackLoop`	model-in-the-loop selective-labels simulation
`positivity_diagnostics`	observable regime/drift alarm — needs no declined-row labels
`CalibratedPDClassifier`	the calibrated PD detector as a scikit-learn estimator
`cldd.fidelity.run_fidelity_gate`	SCM-vs-real marginal-fidelity gate (univariate marginals only)

Add a lever by subclassing Corrector (name, control_priority, apply) and passing correctors=[NaiveCorrector(), MyCorrector()] — the legacy improve_mode API is unchanged and byte-identical. Contract details: CONTRIBUTING.md.

Use the detector from sklearn tooling — CalibratedPDClassifier is a thin, tested wrapper (binary-only; NaN features OK; the full check_estimator battery passes with zero failed checks on scikit-learn 1.7.2–1.9.0; probabilities byte-identical to the research API):

from sklearn.model_selection import cross_val_score
from cldd import CalibratedPDClassifier

scores = cross_val_score(CalibratedPDClassifier(random_state=42), X, y, scoring="neg_brier_score")

Command-line drivers

Each driver runs without install (adds src/ to the path) and writes to artifacts/:

python scripts/run_clue.py                    # the closed loop → frontier table + plot (--generator scm for the SCM world)
python scripts/run_seed_sweep.py --quick      # counterfactual certification (drop --quick for all seeds)
python scripts/run_reject_inference.py        # reject-inference levers vs the frontier
python scripts/run_exploration_sweep.py       # frontier vs exploration budget
python scripts/run_feedback.py                # model-in-the-loop feedback simulation
python scripts/paired_significance.py         # recompute the headline stat from committed CSVs

Validation

pytest — 123 tests, all synthetic, no real data needed. CI runs a pinned-repro job (exact pins), a cross-version/OS compat matrix, and a strict docs build. Six float-sensitive tests reproduce only under the pins in requirements-dev.txt; the optional marginal-fidelity gate compares the SCM against a private real dataset via CLDD_DATA_DIR and is the only thing that needs it. Details, reproducibility, and troubleshooting: docs/validation.md.

Documentation

Where	What
docs/quickstart.md	run the loop, the counterfactual eval, the fidelity report
docs/how-it-works.md	loop mechanics, diagnostics, feedback simulation, repo map
docs/configuration.md	every knob (`config.py`) and the one env var
docs/validation.md	tests, gates, reproducibility, troubleshooting
docs/reject_inference.md	the four RI methods and their honest (modest) results
`FABLE.md`	the accompanying article — independent results & methodology assessment

Build locally: pip install -e ".[docs]" && sphinx-build -b html -W docs docs/_build/html.

Status

0.1.0 alpha on PyPI, changelog in CHANGELOG.md. Shipped: the loop, both worlds, all levers, the fidelity gate, the sklearn estimator, CI on three gates. CLDD began as a validation harness for the Intuit TechWeek SMB Underwriting Challenge; it is not a submission and does not alter challenge files.

Citation

Metadata in CITATION.cff (GitHub's "Cite this repository" reads it):

@software{pazooki_cldd_2026,
  author  = {Pazooki, Hossain},
  title   = {{closed-loop-default-detection}: measuring selective-labels default
             detection and the PD model's operating frontier},
  year    = {2026},
  version = {0.1.0},
  license = {MIT},
  url     = {https://github.com/hossainpazooki/closed-loop-default-detection}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hossainpazooki

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

closed_loop_default_detection-0.1.0.tar.gz (87.1 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

closed_loop_default_detection-0.1.0-py3-none-any.whl (65.9 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file closed_loop_default_detection-0.1.0.tar.gz.

File metadata

Download URL: closed_loop_default_detection-0.1.0.tar.gz
Upload date: Jul 3, 2026
Size: 87.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for closed_loop_default_detection-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`90443e5da62c294b58affe7fd1b3ef00d645589d44fa345eb9ea99064ccdfe43`
MD5	`64c916d5fee87ebaf85b07fcbffb6003`
BLAKE2b-256	`0a95c6e350b9e3008715f1e20adeef1e293fdd1d433423291987fde57d534825`

See more details on using hashes here.

Provenance

The following attestation bundles were made for closed_loop_default_detection-0.1.0.tar.gz:

Publisher: release.yml on hossainpazooki/closed-loop-default-detection

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: closed_loop_default_detection-0.1.0.tar.gz
- Subject digest: 90443e5da62c294b58affe7fd1b3ef00d645589d44fa345eb9ea99064ccdfe43
- Sigstore transparency entry: 2064168064
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: hossainpazooki/closed-loop-default-detection@6fd839e062d8125565f76a1a36f321f68d92fb4e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hossainpazooki
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6fd839e062d8125565f76a1a36f321f68d92fb4e
- Trigger Event: push

File details

Details for the file closed_loop_default_detection-0.1.0-py3-none-any.whl.

File metadata

Download URL: closed_loop_default_detection-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 65.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for closed_loop_default_detection-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9f692601dd57ff5fa83bb0b53fe09e0299732ad08ddf62af8c0835107a5bc5e`
MD5	`37ce9d874db7ef9d8700af9cb95fff8f`
BLAKE2b-256	`59b8355ceab451faa63b39681bba7f8733291ec8ce22c72f04f0996883deca2a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for closed_loop_default_detection-0.1.0-py3-none-any.whl:

Publisher: release.yml on hossainpazooki/closed-loop-default-detection

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: closed_loop_default_detection-0.1.0-py3-none-any.whl
- Subject digest: b9f692601dd57ff5fa83bb0b53fe09e0299732ad08ddf62af8c0835107a5bc5e
- Sigstore transparency entry: 2064168079
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: hossainpazooki/closed-loop-default-detection@6fd839e062d8125565f76a1a36f321f68d92fb4e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hossainpazooki
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6fd839e062d8125565f76a1a36f321f68d92fb4e
- Trigger Event: push

closed-loop-default-detection 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

CLDD — closed-loop default detection

The result it produces

Install

60-second tour

What's in the box

Command-line drivers

Validation

Documentation

Status

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance