This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

Project description

Fair tree classifier using strong demographic parity

Implementation of the algorithm proposed in:

Pereira Barata, A. et al. Fair tree classifier using strong demographic parity. Machine Learning (2023). [>>]

This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

When incorporating FairDecisionTreeClassifier or FairRandomForestClassifier objects into scikit-learn pipelines, use the fit_params={"z": z} parameter to pass the sensitive attribute(s) z

Installation

A)
pip install fair-trees

B)
git clone https://github.com/pereirabarataap/fair_tree_classifier
pip install -r requirements.txt

Usage

from fair_trees import FairRandomForestClassifier as FRFC, load_datasets, sdp_score

datasets = load_datasets()
X = datasets["adult"]["X"]
y = datasets["adult"]["y"]
z = datasets["adult"]["z"]["gender"]

clf = FRFC(theta=0.5).fit(X,y,z)
y_prob = clf.predict_proba(X)[:,1]
print(sdp_score(z, y_prob))

Example

import numpy as np
import pandas as pd
import seaborn as sb
from tqdm.notebook import tqdm
from matplotlib import pyplot as plt
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold as SKF
from fair_trees import FairRandomForestClassifier as FRFC, sdp_score, load_datasets

datasets = load_datasets()

results_data = []
for dataset in tqdm(datasets):
    X = datasets[dataset]["X"]
    y = datasets[dataset]["y"]
    z = datasets[dataset]["z"]
    
    fold = 0
    skf = SKF(n_splits=5, random_state=42, shuffle=True)
    # ensuring stratified kfold w.r.t. y and z
    splitter_y = pd.concat([y, z], axis=1).astype(str).apply(
        lambda row:
            row[y.name] + "".join([row[col] for col in z.columns]),
        axis=1
    ).values
    desc_i = f"dataset={dataset} | processing folds"
    for train_idx, test_idx in tqdm(skf.split(X,splitter_y), desc=desc_i, leave=False):
        
        X_train, X_test = X.loc[train_idx], X.loc[test_idx]
        y_train, y_test = y.loc[train_idx], y.loc[test_idx]
        z_train, z_test = z.loc[train_idx], z.loc[test_idx]

        desc_j = f"fold={fold} | fitting thetas"
        for theta in tqdm(np.linspace(0,1,11).round(1), desc=desc_j, leave=False):
            clf = FRFC(
                n_jobs=-1,
                n_bins=256,
                theta=theta,
                max_depth=None,
                bootstrap=True,
                random_state=42,
                n_estimators=500,
                min_samples_leaf=1,
                min_samples_split=2,
                max_features="sqrt",
                requires_data_processing=True
            ).fit(X_train, y_train, z_train)
            y_prob = clf.predict_proba(X_test)[:,1]

            auc = roc_auc_score(y_test, y_prob)

            sdp_min = np.inf
            for sens_att in z.columns:
                if len(np.unique(z_test[sens_att]))==2:
                    sens_val = np.unique(z_test[sens_att])[0]
                    z_true = z_test[sens_att]==sens_val
                    sdp = sdp_score(z_true, y_prob)
                    if sdp < sdp_min:
                        sdp_min = sdp
                else:
                    for sens_val in np.unique(z_test[sens_att]):
                        z_true = z_test[sens_att]==sens_val
                        sdp = sdp_score(z_true, y_prob)
                        if sdp < sdp_min:
                            sdp_min = sdp
            
            data_row = [dataset, fold, theta, auc, sdp_min]
            results_data.append(data_row)
            
        fold += 1
        
results_df = pd.DataFrame(
    data=results_data,
    columns=["dataset", "fold", "theta", "performance", "fairness"]
)

fig, ax = plt.subplots(1,1,dpi=100, figsize=(8,4))
sb.lineplot(
    data=results_df.groupby(by=["dataset", "theta"]).mean(),
    x="fairness",
    y="performance", 
    hue="dataset",
    ax=ax
)
plt.show()

output

3D Figures

https://htmlpreview.github.io/?https://github.com/pereirabarataap/fair_tree_classifier/main/3d/index.html

Project details

Release history Release notifications | RSS feed

3.1.9

Feb 20, 2026

3.1.8

Feb 20, 2026

3.1.7

Feb 20, 2026

3.1.6

Feb 20, 2026

3.1.5

Feb 20, 2026

3.1.4

Feb 20, 2026

3.1.3

Feb 20, 2026

3.1.2

Feb 20, 2026

3.1.1

Feb 20, 2026

3.1.0

Feb 20, 2026

2.6.6

Apr 30, 2025

2.6.5

Apr 16, 2025

2.6.4

Apr 1, 2025

2.6.3

Apr 1, 2025

2.6.2

Apr 1, 2025

2.6.1

Apr 1, 2025

2.4.9

Apr 10, 2024

2.4.8

Apr 10, 2024

2.4.7

Apr 10, 2024

2.4.6

Apr 10, 2024

This version

2.4.5

Apr 10, 2024

2.4.4

Apr 10, 2024

2.4.3

Apr 9, 2024

2.4.2

Apr 9, 2024

2.4.1

Apr 9, 2024

2.3.10

Apr 9, 2024

2.3.9

Apr 9, 2024

2.3.8

Apr 9, 2024

2.3.7

Apr 9, 2024

2.3.5

Apr 9, 2024

2.3.4

Apr 9, 2024

2.3.3

Apr 9, 2024

2.3.2

Apr 9, 2024

2.3.1

Apr 9, 2024

2.3

Apr 9, 2024

2.2

Apr 9, 2024

2.1

Apr 9, 2024

2.0

Apr 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fair_trees-2.4.5.tar.gz (22.5 kB view details)

Uploaded Apr 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fair_trees-2.4.5-py3-none-any.whl (21.4 kB view details)

Uploaded Apr 10, 2024 Python 3

File details

Details for the file fair_trees-2.4.5.tar.gz.

File metadata

Download URL: fair_trees-2.4.5.tar.gz
Upload date: Apr 10, 2024
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for fair_trees-2.4.5.tar.gz
Algorithm	Hash digest
SHA256	`101b21593206f862bcdb0ad95a456c94df546d049e759be77114b46b84a0afcc`
MD5	`f3993f0ff9ff4dd02d748e278403c47d`
BLAKE2b-256	`107db8a297017168336ee8ef46ef7bc01eb186117976ce1c329aeaa6323b0acf`

See more details on using hashes here.

File details

Details for the file fair_trees-2.4.5-py3-none-any.whl.

File metadata

Download URL: fair_trees-2.4.5-py3-none-any.whl
Upload date: Apr 10, 2024
Size: 21.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for fair_trees-2.4.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fff442dae31e538a66a1d71836446e9eaa4da6b6d4bdd83e70b3adc9d5b144ee`
MD5	`ae3daa9690e2810e129e9e2755d78c3b`
BLAKE2b-256	`b182f6842eaeb2b02b81f693cd43dc848a47e4169c3b189cb72e8878b97ffc01`

See more details on using hashes here.

fair-trees 2.4.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Fair tree classifier using strong demographic parity

Installation

Usage

Example

3D Figures

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes