Fairness testing for ML models using Australian demographic data
Project description
verosynthea-validator
Fairness testing for ML models using real Australian demographic data. One line to check whether your model treats demographic groups equally.
pip install verosynthea-validator
from verosynthea_validator import FairnessReport
report = FairnessReport(
data=test_data,
y_true="label",
y_pred="prediction",
protected_columns=["SEXP", "BPLP", "profile_name"],
)
results = report.run()
print(results.summary())
Output:
Fairness Report (n=5,000, overall accuracy=0.847)
============================================================
[PASS] SEXP (2 groups, smallest n=2,451)
Accuracy gap: 0.012
Demographic parity gap: 0.008
Equalised odds gap: 0.015
[FAIL] BPLP (3 groups, smallest n=312)
Accuracy gap: 0.073
Demographic parity gap: 0.091
Equalised odds gap: 0.064
============================================================
Overall: FAIL (worst gap: 0.073 on BPLP)
CI/CD gate
from verosynthea_validator import assert_fair
# Fails the build if any group accuracy gap > 5%
assert_fair(test_data, "label", "prediction", max_accuracy_gap=0.05)
In pytest:
def test_model_fairness():
predictions = model.predict(test_data)
test_data["y_pred"] = predictions
assert_fair(
test_data, "y_true", "y_pred",
protected_columns=["SEXP", "BPLP", "profile_name"],
max_accuracy_gap=0.05,
max_demographic_parity_gap=0.10,
)
What it measures
For each protected column (e.g. sex, birthplace, demographic profile), the validator computes:
| Metric | What it checks |
|---|---|
| Accuracy gap | Max accuracy difference between any two groups |
| Demographic parity gap | Max difference in selection rate (P(y_pred=1)) |
| Equalised odds gap | Max difference in true positive rate or false positive rate |
Groups smaller than 30 observations are excluded (configurable via min_group_size).
Why this instead of fairlearn or aif360?
Those are general-purpose fairness frameworks. This package is purpose-built for Australian demographics:
- Pre-loaded demographic data. The free tier includes 5,000 synthetic individuals from AUSynth with 25 Census-calibrated variables. No need to source your own protected attributes.
- 8 demographic profiles. AUSynth clusters every person into one of 8 profiles (High-earning professionals, Young singles, Retired, etc.) — a richer protected attribute than just age or sex.
- Australia-specific calibration. Variables match ABS Census 2021 categories exactly. Income brackets, occupation codes, education levels, birthplace regions — all in Australian standard classifications.
- One-line CI gate.
assert_fair()drops into pytest with zero configuration.
Data tiers
| Tier | Data | Cost |
|---|---|---|
| Free | 5,000-row Paddington 4064 sample from Hugging Face | $0 |
| Paid | Full national dataset (32M individuals, 15,352 suburbs) via API | verosynthea.com |
from verosynthea_validator import load_ausynth_sample
# Free tier (downloads from HF on first call)
df = load_ausynth_sample()
# Paid tier
df = load_ausynth_sample(api_key="vero_...", geography="bondi-2026-nsw")
The 8 demographic profiles
| ID | Name | Typical characteristics |
|---|---|---|
| 0 | Labourers and operators | Blue-collar, lower income |
| 1 | Young singles and non-workers | Under 25, students, NILF |
| 2 | Children | Under 15 |
| 3 | Non-earning dependants | Adults not in workforce |
| 4 | Trades and technical workers | Certificate-qualified, mid income |
| 5 | Established partnered households | Married, mid-career |
| 6 | Retired and semi-retired | Over 60, pension income |
| 7 | High-earning professionals | Degree-qualified, professional occupations |
Installation
pip install verosynthea-validator # core (pandas + numpy)
pip install verosynthea-validator[hf] # + Hugging Face datasets loader
pip install verosynthea-validator[paid] # + httpx for API access
pip install verosynthea-validator[dev] # + pytest + sklearn for development
Links
- Dataset: vero-synthea/ausynth-sample on Hugging Face
- Full product: verosynthea.com
- Methodology: verosynthea.com/about
Citation
Verosynthea AUSynth (2026). Synthetic Australian Census Data.
https://verosynthea.com
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file verosynthea_validator-0.1.0.tar.gz.
File metadata
- Download URL: verosynthea_validator-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
676d8c54c4129d541a7d3f364051f07a2094a353dcbcaa2b9b077910cbaefd12
|
|
| MD5 |
14e79cafc5f7bc30cab82c00db3b1b17
|
|
| BLAKE2b-256 |
5656261ed36f13c12eb614dbef232c201ae7bc25eb229b552bcdd0d3f7ea95eb
|
File details
Details for the file verosynthea_validator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: verosynthea_validator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
671d217debe0317af3b565dacd856407a7e12fd72ab3e0635bfb31c7fdf44d99
|
|
| MD5 |
ffaf0add62ec195e8f0d0b8658f9ece4
|
|
| BLAKE2b-256 |
3afdc7eb1c8979a441e0efec140d503cfd3f81bfacad03029e37f14631c18e59
|