Statistical disparate impact analysis for HMDA data — the methodology federal examiners use, open-sourced for community advocates

These details have not been verified by PyPI

Project links

Project description

fair-lending-screener

Statistical disparate impact analysis for HMDA mortgage data — the methodology federal examiners use, open-sourced for community advocates and investigative journalists.

What This Tool Does / Does Not Do

DOES:

Run adjusted denial disparity analysis on public HMDA mortgage data
Use binary logistic regression with FFIEC-standard controls (income, LTV, DTI, property value, MSA)
Report adjusted odds ratios with 95% confidence intervals and p-values
Generate journalist-legible Markdown reports explaining what was found and what it means
Cite the regulatory methodology (FFIEC Interagency Fair Lending Examination Procedures, 2009)
Tell you clearly what it cannot conclude

DOES NOT:

Prove discrimination — it identifies statistical screening signals warranting further review
Include credit score, AUS recommendations, or asset data (not in public HMDA)
Analyze Section 1071 small business lending (different statute, different data)
Replace a full fair lending examination by federal regulators with access to internal lender data
Use black-box ML — every result is an auditable logistic regression you can reproduce

Alpha release (v0.1.0). Methodology peer review by an external fair lending expert is planned before v1.0.0. Use as a screening tool to identify cases warranting further analysis, not as a basis for enforcement or accusation.

Why It Exists

Community advocates, investigative journalists, and fair housing nonprofits routinely need to answer the question: "Are these denial rate disparities statistically suspicious after controlling for income, loan size, and geography?"

Currently that analysis requires either:

$50K+ commercial software (ComplianceTech, RiskExec, RATA Comply, Abrigo) — out of reach for nonprofits
Stata fluency — out of reach for most journalists
Months of methodology work — The Markup spent months building their 2021 analysis

This package makes that analysis installable in 30 seconds.

Limitations — Read This First

Public HMDA data does not include:

Credit score — The most predictive underwriting variable, excluded from public HMDA by industry lobbying. Its absence means results are upper-bound estimates of the unexplained disparity.
AUS recommendations — Fannie/Freddie DU/LP decisions, the primary underwriting tool, are not public.
Asset and reserve data — Not reported in HMDA.
Employment history — Not reported in HMDA.
Underwriter override data — Discretionary overrides are internal lender data.

A statistically significant adjusted disparity does not mean the lender discriminated. It means the disparity warrants further review with full loan-file data. See docs/limitations.md for the complete list.

Installation

pip install fair-lending-screener

Both import styles work:

import fair_lending_screener
import fairlendingscreener  # alias — identical

Quickstart

import warnings
import fair_lending_screener as fls

# Load HMDA data from the CFPB public API (real data; requires internet)
df_raw = fls.load_from_api(year=2023, state="IL", limit=20_000)

# Apply FFIEC-standard dataset filters:
# conventional, first-lien, home purchase, site-built 1-4 unit,
# principal residence, LTV ≤ 100%
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    df = fls.prepare_for_analysis(df_raw)

# Run adjusted denial disparity analysis:
# logistic regression with income, LTV, DTI, property value, MSA controls
result = fls.adjusted_denial_disparity(
    df,
    protected_class="Black or African American",
    comparison_class="White",
)

# Key numbers
print(f"Unadjusted odds ratio:    {result.unadjusted_odds_ratio:.2f}×")
print(f"Adjusted odds ratio:      {result.adjusted_odds_ratio:.2f}×")
print(f"95% CI:                   {result.confidence_interval_95[0]:.2f}–{result.confidence_interval_95[1]:.2f}×")
print(f"p-value:                  {result.p_value:.4f}")
print(f"Statistically significant: {result.is_statistically_significant}")
print(f"Sample size:              {result.sample_size:,}")

# Generate a journalist-legible Markdown report
report = fls.generate_disparity_report(
    result,
    lender_name="First Midwest Bank",  # optional — suppressed if result is not statistically significant
    geography="Illinois",
    year=2023,
)
print(report)

Using Synthetic Sample Data (no internet required)

import warnings
import fair_lending_screener as fls

# Synthetic data for testing — NOT real HMDA, NOT for conclusions
raw = fls.load_sample(n=2000, seed=42)
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    df = fls.prepare_for_analysis(raw)

result = fls.adjusted_denial_disparity(
    df,
    protected_class="Black or African American",
    comparison_class="White",
)
print(f"Adjusted OR: {result.adjusted_odds_ratio:.2f}×, p={result.p_value:.4f}")

Understanding the Output

Unadjusted vs. Adjusted Odds Ratio

Unadjusted: Raw denial rate disparity — no controls for income, loan size, or geography
Adjusted: Disparity after statistically holding constant income (log), loan amount (log), LTV, DTI, property value (log), and MSA fixed effects

An adjusted odds ratio of 1.8× means: among applicants who look similar on paper — same income, loan size, LTV, property value, and MSA — Black applicants faced 80% higher odds of denial than White applicants.

The difference between the two ratios shows how much of the raw disparity is explained by the available controls.

What "Statistically Significant" Means

A result is flagged as statistically significant when:

p-value < 0.05 (the disparity is unlikely to be due to chance)
95% CI excludes 1.0 (the direction of the disparity is reliable)

Both conditions must hold. A large odds ratio with p = 0.08 is not reported as significant.

What Controls Are Used

Per FFIEC Interagency Fair Lending Examination Procedures (2009):

Control	Notes
`log(applicant_income)`	Ability-to-repay; log-transformed for skew
`log(loan_amount)`	Loan size
`loan_to_value_ratio`	Collateral coverage
`debt_to_income_ratio`	Binned: ≤35%, 36–42%, 43–49%, ≥50%, missing
`log(property_value)`	Collateral value
MSA fixed effects	~300–400 dummies for local market conditions

How This Compares to Commercial Tools

	fair-lending-screener	Commercial tools (ComplianceTech, RATA Comply, etc.)
Cost	Free, open-source	$20K–$100K+/year
Data	Public HMDA only	Internal lender data + HMDA
Credit score	Not available (public HMDA)	Available via lender data feed
AUS data	Not available	Available
Methodology	Published, citable, auditable	Proprietary
Target user	Advocates, journalists, researchers	Lender compliance teams
Intended use	Screening signals for advocacy/research	Regulatory compliance management

Methodology

Full methodology documentation is in docs/methodology.md. Every statistical decision cites a regulatory or academic source.

Short version:

Binary logistic regression (statsmodels.api.Logit) — not sklearn, not ML
Outcome: action_taken == 3 (denied) vs. action_taken == 1 (originated)
Protected class: derived_race (self-reported per Regulation C)
Controls: FFIEC standard set (income, LTV, DTI, property value, MSA)
Dataset filters: conventional, first-lien, home purchase, site-built 1–4 unit, owner-occupied, LTV ≤ 100%
Calibration target: The Markup (2021) found 1.8× adjusted OR for Black vs. White applicants nationally. Expected range from this tool: 1.6–2.2× (above The Markup's figure because we omit AUS and credit score — known upward-bias direction per Wooldridge 2019 §3.3)

Regulatory basis: FFIEC Interagency Fair Lending Examination Procedures (2009). This is the methodology OCC, Federal Reserve, FDIC, NCUA, and CFPB examiners use.

Coming in Future Versions

v0.2.0 (planned):

Extended control set: AUS, credit model used, lender type, census tract demographics — toward full replication of The Markup's 17-variable specification
Pricing disparity analysis (linear regression on rate spread or APR)

v0.3.0+:

BISG race/ethnicity proxy for non-HMDA products (auto, student)
Redlining geographic analysis (census-tract lender presence)
Peer benchmarking (lender vs. market comparison)
Multilevel/hierarchical MSA modeling

Scope

Mortgage lending (HMDA-reportable transactions) only.

Does NOT analyze Section 1071 small business lending
Does NOT include credit score (not in public HMDA)
Does NOT constitute a finding of discrimination
DOES identify statistically significant adjusted disparities warranting further review

Methodology Feedback

Open a GitHub issue tagged methodology with specific concerns and citations. All methodology changes are versioned and documented in CHANGELOG.md.

Citation

If you use this tool in research or journalism, please cite it:

Patel, Jay (2026). fair-lending-screener (v0.1.0). MIT License.
https://github.com/Jaypatel1511/fair-lending-screener

See CITATION.cff for full citation metadata.

License

MIT License. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 23, 2026

0.1.1 yanked

May 22, 2026

Reason this release was yanked:

Release-process drift: built from uncommitted working tree, breaking API in patch release. Superseded by v0.2.0.

This version

0.1.0

May 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fair_lending_screener-0.1.0.tar.gz (58.2 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fair_lending_screener-0.1.0-py3-none-any.whl (30.1 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file fair_lending_screener-0.1.0.tar.gz.

File metadata

Download URL: fair_lending_screener-0.1.0.tar.gz
Upload date: May 22, 2026
Size: 58.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for fair_lending_screener-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`734bd12dfcceeb5ee885cba56287fe10ec2ccabc88602d43c3ad800b3f950f88`
MD5	`03b9aa743ccdcf980fc3c60934f512c8`
BLAKE2b-256	`63c7afec265aea7b6ae0b92d64742200a5cda7ccebbfc99dea38cd6081de156c`

See more details on using hashes here.

File details

Details for the file fair_lending_screener-0.1.0-py3-none-any.whl.

File metadata

Download URL: fair_lending_screener-0.1.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 30.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for fair_lending_screener-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85a3ea75f09021cac1902f75c1359637944c8e3374b7c85e03fd3162b4d5d85c`
MD5	`0989cd206ffd311409423f665be2acd7`
BLAKE2b-256	`9ad1f387fec76137389792de6eefa8ed57934570734c8d68b92c6088bb62249d`

See more details on using hashes here.

fair-lending-screener 0.1.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fair-lending-screener

What This Tool Does / Does Not Do

Why It Exists

Limitations — Read This First

Installation

Quickstart

Using Synthetic Sample Data (no internet required)

Understanding the Output

Unadjusted vs. Adjusted Odds Ratio

What "Statistically Significant" Means

What Controls Are Used

How This Compares to Commercial Tools

Methodology

Coming in Future Versions

Scope

Methodology Feedback

Citation

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes