Fairness testing for ML models using Australian demographic data

These details have not been verified by PyPI

Project links

Project description

verosynthea-validator

Fairness testing for ML models using real Australian demographic data. One line to check whether your model treats demographic groups equally.

pip install verosynthea-validator

from verosynthea_validator import FairnessReport

report = FairnessReport(
    data=test_data,
    y_true="label",
    y_pred="prediction",
    protected_columns=["SEXP", "BPLP", "profile_name"],
)
results = report.run()
print(results.summary())

Output:

Fairness Report (n=5,000, overall accuracy=0.847)
============================================================

[PASS] SEXP (2 groups, smallest n=2,451)
  Accuracy gap:           0.012
  Demographic parity gap: 0.008
  Equalised odds gap:     0.015

[FAIL] BPLP (3 groups, smallest n=312)
  Accuracy gap:           0.073
  Demographic parity gap: 0.091
  Equalised odds gap:     0.064

============================================================
Overall: FAIL (worst gap: 0.073 on BPLP)

CI/CD gate

from verosynthea_validator import assert_fair

# Fails the build if any group accuracy gap > 5%
assert_fair(test_data, "label", "prediction", max_accuracy_gap=0.05)

In pytest:

def test_model_fairness():
    predictions = model.predict(test_data)
    test_data["y_pred"] = predictions
    assert_fair(
        test_data, "y_true", "y_pred",
        protected_columns=["SEXP", "BPLP", "profile_name"],
        max_accuracy_gap=0.05,
        max_demographic_parity_gap=0.10,
    )

What it measures

For each protected column (e.g. sex, birthplace, demographic profile), the validator computes:

Metric	What it checks
Accuracy gap	Max accuracy difference between any two groups
Demographic parity gap	Max difference in selection rate (P(y_pred=1))
Equalised odds gap	Max difference in true positive rate or false positive rate

Groups smaller than 30 observations are excluded (configurable via min_group_size).

Why this instead of fairlearn or aif360?

Those are general-purpose fairness frameworks. This package is purpose-built for Australian demographics:

Pre-loaded demographic data. The free tier includes 5,000 synthetic individuals from AUSynth with 25 Census-calibrated variables. No need to source your own protected attributes.
8 demographic profiles. AUSynth clusters every person into one of 8 profiles (High-earning professionals, Young singles, Retired, etc.) — a richer protected attribute than just age or sex.
Australia-specific calibration. Variables match ABS Census 2021 categories exactly. Income brackets, occupation codes, education levels, birthplace regions — all in Australian standard classifications.
One-line CI gate. assert_fair() drops into pytest with zero configuration.

Data tiers

Tier	Data	Cost
Free	5,000-row Paddington 4064 sample from Hugging Face	$0
Paid	Full national dataset (32M individuals, 15,352 suburbs) via API	verosynthea.com

from verosynthea_validator import load_ausynth_sample

# Free tier (downloads from HF on first call)
df = load_ausynth_sample()

# Paid tier
df = load_ausynth_sample(api_key="vero_...", geography="bondi-2026-nsw")

The 8 demographic profiles

ID	Name	Typical characteristics
0	Labourers and operators	Blue-collar, lower income
1	Young singles and non-workers	Under 25, students, NILF
2	Children	Under 15
3	Non-earning dependants	Adults not in workforce
4	Trades and technical workers	Certificate-qualified, mid income
5	Established partnered households	Married, mid-career
6	Retired and semi-retired	Over 60, pension income
7	High-earning professionals	Degree-qualified, professional occupations

Installation

pip install verosynthea-validator          # core (pandas + numpy)
pip install verosynthea-validator[hf]     # + Hugging Face datasets loader
pip install verosynthea-validator[paid]   # + httpx for API access
pip install verosynthea-validator[dev]    # + pytest + sklearn for development

Citation

Verosynthea AUSynth (2026). Synthetic Australian Census Data.
https://verosynthea.com

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verosynthea_validator-0.1.0.tar.gz (11.3 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verosynthea_validator-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file verosynthea_validator-0.1.0.tar.gz.

File metadata

Download URL: verosynthea_validator-0.1.0.tar.gz
Upload date: Jun 3, 2026
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for verosynthea_validator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`676d8c54c4129d541a7d3f364051f07a2094a353dcbcaa2b9b077910cbaefd12`
MD5	`14e79cafc5f7bc30cab82c00db3b1b17`
BLAKE2b-256	`5656261ed36f13c12eb614dbef232c201ae7bc25eb229b552bcdd0d3f7ea95eb`

See more details on using hashes here.

File details

Details for the file verosynthea_validator-0.1.0-py3-none-any.whl.

File metadata

Download URL: verosynthea_validator-0.1.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for verosynthea_validator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`671d217debe0317af3b565dacd856407a7e12fd72ab3e0635bfb31c7fdf44d99`
MD5	`ffaf0add62ec195e8f0d0b8658f9ece4`
BLAKE2b-256	`3afdc7eb1c8979a441e0efec140d503cfd3f81bfacad03029e37f14631c18e59`

See more details on using hashes here.

verosynthea-validator 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

verosynthea-validator

CI/CD gate

What it measures

Why this instead of fairlearn or aif360?

Data tiers

The 8 demographic profiles

Installation

Links

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes