Skip to main content

Fairness testing for ML models using Australian demographic data

Project description

verosynthea-validator

Fairness testing for ML models using real Australian demographic data. One line to check whether your model treats demographic groups equally.

pip install verosynthea-validator
from verosynthea_validator import FairnessReport

report = FairnessReport(
    data=test_data,
    y_true="label",
    y_pred="prediction",
    protected_columns=["SEXP", "BPLP", "profile_name"],
)
results = report.run()
print(results.summary())

Output:

Fairness Report (n=5,000, overall accuracy=0.847)
============================================================

[PASS] SEXP (2 groups, smallest n=2,451)
  Accuracy gap:           0.012
  Demographic parity gap: 0.008
  Equalised odds gap:     0.015

[FAIL] BPLP (3 groups, smallest n=312)
  Accuracy gap:           0.073
  Demographic parity gap: 0.091
  Equalised odds gap:     0.064

============================================================
Overall: FAIL (worst gap: 0.073 on BPLP)

CI/CD gate

from verosynthea_validator import assert_fair

# Fails the build if any group accuracy gap > 5%
assert_fair(test_data, "label", "prediction", max_accuracy_gap=0.05)

In pytest:

def test_model_fairness():
    predictions = model.predict(test_data)
    test_data["y_pred"] = predictions
    assert_fair(
        test_data, "y_true", "y_pred",
        protected_columns=["SEXP", "BPLP", "profile_name"],
        max_accuracy_gap=0.05,
        max_demographic_parity_gap=0.10,
    )

What it measures

For each protected column (e.g. sex, birthplace, demographic profile), the validator computes:

Metric What it checks
Accuracy gap Max accuracy difference between any two groups
Demographic parity gap Max difference in selection rate (P(y_pred=1))
Equalised odds gap Max difference in true positive rate or false positive rate

Groups smaller than 30 observations are excluded (configurable via min_group_size).

Why this instead of fairlearn or aif360?

Those are general-purpose fairness frameworks. This package is purpose-built for Australian demographics:

  • Pre-loaded demographic data. The free tier includes 5,000 synthetic individuals from AUSynth with 25 Census-calibrated variables. No need to source your own protected attributes.
  • 8 demographic profiles. AUSynth clusters every person into one of 8 profiles (High-earning professionals, Young singles, Retired, etc.) — a richer protected attribute than just age or sex.
  • Australia-specific calibration. Variables match ABS Census 2021 categories exactly. Income brackets, occupation codes, education levels, birthplace regions — all in Australian standard classifications.
  • One-line CI gate. assert_fair() drops into pytest with zero configuration.

Data tiers

Tier Data Cost
Free 5,000-row Paddington 4064 sample from Hugging Face $0
Paid Full national dataset (32M individuals, 15,352 suburbs) via API verosynthea.com
from verosynthea_validator import load_ausynth_sample

# Free tier (downloads from HF on first call)
df = load_ausynth_sample()

# Paid tier
df = load_ausynth_sample(api_key="vero_...", geography="bondi-2026-nsw")

The 8 demographic profiles

ID Name Typical characteristics
0 Labourers and operators Blue-collar, lower income
1 Young singles and non-workers Under 25, students, NILF
2 Children Under 15
3 Non-earning dependants Adults not in workforce
4 Trades and technical workers Certificate-qualified, mid income
5 Established partnered households Married, mid-career
6 Retired and semi-retired Over 60, pension income
7 High-earning professionals Degree-qualified, professional occupations

Installation

pip install verosynthea-validator          # core (pandas + numpy)
pip install verosynthea-validator[hf]     # + Hugging Face datasets loader
pip install verosynthea-validator[paid]   # + httpx for API access
pip install verosynthea-validator[dev]    # + pytest + sklearn for development

Links

Citation

Verosynthea AUSynth (2026). Synthetic Australian Census Data.
https://verosynthea.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verosynthea_validator-0.1.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verosynthea_validator-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file verosynthea_validator-0.1.0.tar.gz.

File metadata

  • Download URL: verosynthea_validator-0.1.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for verosynthea_validator-0.1.0.tar.gz
Algorithm Hash digest
SHA256 676d8c54c4129d541a7d3f364051f07a2094a353dcbcaa2b9b077910cbaefd12
MD5 14e79cafc5f7bc30cab82c00db3b1b17
BLAKE2b-256 5656261ed36f13c12eb614dbef232c201ae7bc25eb229b552bcdd0d3f7ea95eb

See more details on using hashes here.

File details

Details for the file verosynthea_validator-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for verosynthea_validator-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 671d217debe0317af3b565dacd856407a7e12fd72ab3e0635bfb31c7fdf44d99
MD5 ffaf0add62ec195e8f0d0b8658f9ece4
BLAKE2b-256 3afdc7eb1c8979a441e0efec140d503cfd3f81bfacad03029e37f14631c18e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page