Skip to main content

Python wrappers around the R 'dsld' package via rpy2

Project description

dsldPy — Python Interface to DSLD

Statistical and graphical tools for detecting and measuring discrimination and bias in data, exposed to Python via rpy2. dsldPy wraps the R package dsld with a Python-friendly API while using the same underlying, well-tested R implementations.

Overview

DSLD supports two complementary workflows:

  • Estimation analysis: quantify possible discrimination by estimating effects of a sensitive variable S on an outcome Y, while adjusting for confounders C.
  • Prediction analysis (fair ML): build predictive models that limit the influence of S and its proxies O, trading off fairness and utility.

dsldPy provides wrappers for both, including visualization helpers. Most functions accept pandas DataFrame input; rpy2 handles conversion to R data.frames internally.

Prerequisites

  • R installed and on PATH (R 4.x recommended)
  • R package dsld installed (CRAN or GitHub)
  • Python 3.8+

Install dsld in R:

install.packages("dsld")
## or latest development version
# install.packages("remotes")
remotes::install_github("matloff/dsld", force = TRUE)

Tip: Ensure rpy2 can find R. From a terminal: R RHOME should print your R home. If Python cannot find R, set R_HOME in your environment per rpy2’s documentation.

Installation

Install the Python package from this repository (subdirectory inst):

# from repo root
pip install ./inst            # regular install
# or
pip install -e ./inst         # editable install for development

Install directly from GitHub:

pip install "git+https://github.com/matloff/dsld@main#subdirectory=inst"

This installs dsldPy and its Python dependencies (pandas, numpy, rpy2, etc.). You still need R and the dsld R package installed, as noted above.

Quickstart

Below are minimal end-to-end examples using the dsld R dataset svcensus.

Load data into Python (via rpy2) and run a confounder-adjusted linear model:

import pandas as pd
from rpy2.robjects import r
from rpy2.robjects.packages import importr
from dsldPy import dsldPyLinear, dsldPyLinearSummary

# load R data into the R session
dsld = importr('dsld')
r('data(svcensus)')
svcensus_r = r['svcensus']            # R data.frame

# fit: Y = wageinc, S = gender, adjust for confounders automatically
model = dsldPyLinear(svcensus_r, 'wageinc', 'gender', interactions=False)
dsldPyLinearSummary(model)            # prints coefficient table and S comparisons

Build a fair KNN model that limits a proxy’s influence (e.g., occupation):

from dsldPy import dsldPyQeFairKNN, dsldPyQeFairML_Predict
from rpy2.robjects import r

r('data(svcensus)')
svcensus_r = r['svcensus']

# Reduce proxy impact by de-weighting a feature (e.g., 'occ') to 0.2
res = dsldPyQeFairKNN(
    svcensus_r,
    yName='wageinc',
    sNames='gender',
    deweightPars={'occ': 0.2},
    k=25,
    scaleX=True,
)

print('Train accuracy:', res['train_accuracy'])
print('Fairness correlations:', res['train_correlations'])

# Predict on held-out data (same schema as training data)
pred = dsldPyQeFairML_Predict(res, svcensus_r)
print('Test correlations:', pred['test_correlations'])

Available Wrappers (selected)

  • Analytical: dsldPyLinear, dsldPyLogit (+ Summary, Coef, Vcov, Predict), dsldPyML, dsldPyMatchedATE, dsldPyTakeALookAround, dsldPyConfounders, dsldPyCHunting, dsldPyOHunting
  • Fair ML: dsldPyFrrm, dsldPyFgrrm, dsldPyNclm, dsldPyZlm, dsldPyZlrm, dsldPyQeFairKNN, dsldPyQeFairRF, dsldPyQeFairRidgeLin, dsldPyQeFairRidgeLog, dsldPyFairML_Predict, dsldPyFairUtils, dsldPyQeFairML_Predict
  • Graphical: dsldPyFreqPCoord, dsldPyScatterPlot3D, dsldPyConditDisparity, dsldPyDensitybyS, dsldPyFrequencybyS, dsldPyIamb

Function names mirror the R package; arguments are standard Python types (pandas DataFrames, dicts, bools, etc.). Internally, rpy2 converts to/from R objects.

Examples

Jupyter notebooks are available in this repository:

  • inst/examples/graphical.ipynb
  • inst/examples/tabular.ipynb
  • inst/examples/machine_learning.ipynb

These demonstrate end-to-end workflows for estimation and fair ML, including parameter tuning with dsldPyFairUtils.

Troubleshooting

  • rpy2 cannot find R: confirm R RHOME works; if not, add R to PATH or set R_HOME. See rpy2 docs for your OS.
  • dsld not installed in R: run install.packages("dsld") in an R session.
  • Dataset conversions: wrappers accept either pandas DataFrames or R data.frames; if needed, see utilities in dsldPy.Utils for explicit conversions.

Authors

  • Norm Matloff
  • Aditya Mittal
  • Taha Abdullah
  • Arjun Ashok
  • Shubhada Martha
  • Billy Ouattara
  • Jonathan Tran
  • Brandon Zarate

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsldpy-0.0.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsldpy-0.0.1-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file dsldpy-0.0.1.tar.gz.

File metadata

  • Download URL: dsldpy-0.0.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for dsldpy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 0d38d78f045a4772db31e7fc11e5d1e728d293269ff9a2f522bccf048c48a41d
MD5 319bed961848d5dcfc22516e4151d917
BLAKE2b-256 4f58b64306827fc3b0b389f750e51cb09ca617b74ec6867c4ec3aa6545c947bc

See more details on using hashes here.

File details

Details for the file dsldpy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dsldpy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for dsldpy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8870fad407b66c3bc99555f0ea2bbb62c79997a7115672dcd93831200844b9b5
MD5 ee54b25eecb35e748fb50aea2b0f02e9
BLAKE2b-256 11ba25da190ed8419ac9fb2e8a11f4ceda17f4c62a84eb6f910e611d0b8589fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page