Python wrappers around the R 'dsld' package via rpy2
Project description
dsldPy — Python Interface to DSLD
Statistical and graphical tools for detecting and measuring discrimination and bias in data, exposed to Python via rpy2. dsldPy wraps the R package dsld with a Python-friendly API while using the same underlying, well-tested R implementations.
- Quarto Book (concepts and examples): https://htmlpreview.github.io/?https://github.com/matloff/dsldBook/blob/main/_book/index.html
- Research Paper (implementation details): https://arxiv.org/abs/2411.04228
Overview
DSLD supports two complementary workflows:
- Estimation analysis: quantify possible discrimination by estimating effects of a sensitive variable S on an outcome Y, while adjusting for confounders C.
- Prediction analysis (fair ML): build predictive models that limit the influence of S and its proxies O, trading off fairness and utility.
dsldPy provides wrappers for both, including visualization helpers. Most functions accept pandas DataFrame input; rpy2 handles conversion to R data.frames internally.
Prerequisites
- R installed and on PATH (R 4.x recommended)
- R package dsld installed (CRAN or GitHub)
- Python 3.8+
Install dsld in R:
install.packages("dsld")
## or latest development version
# install.packages("remotes")
remotes::install_github("matloff/dsld", force = TRUE)
Tip: Ensure rpy2 can find R. From a terminal: R RHOME should print your R home. If Python cannot find R, set R_HOME in your environment per rpy2’s documentation.
Installation
Install the Python package from this repository (subdirectory inst):
# from repo root
pip install ./inst # regular install
# or
pip install -e ./inst # editable install for development
Install directly from GitHub:
pip install "git+https://github.com/matloff/dsld@main#subdirectory=inst"
This installs dsldPy and its Python dependencies (pandas, numpy, rpy2, etc.). You still need R and the dsld R package installed, as noted above.
Quickstart
Below are minimal end-to-end examples using the dsld R dataset svcensus.
Load data into Python (via rpy2) and run a confounder-adjusted linear model:
import pandas as pd
from rpy2.robjects import r
from rpy2.robjects.packages import importr
from dsldPy import dsldPyLinear, dsldPyLinearSummary
# load R data into the R session
dsld = importr('dsld')
r('data(svcensus)')
svcensus_r = r['svcensus'] # R data.frame
# fit: Y = wageinc, S = gender, adjust for confounders automatically
model = dsldPyLinear(svcensus_r, 'wageinc', 'gender', interactions=False)
dsldPyLinearSummary(model) # prints coefficient table and S comparisons
Build a fair KNN model that limits a proxy’s influence (e.g., occupation):
from dsldPy import dsldPyQeFairKNN, dsldPyQeFairML_Predict
from rpy2.robjects import r
r('data(svcensus)')
svcensus_r = r['svcensus']
# Reduce proxy impact by de-weighting a feature (e.g., 'occ') to 0.2
res = dsldPyQeFairKNN(
svcensus_r,
yName='wageinc',
sNames='gender',
deweightPars={'occ': 0.2},
k=25,
scaleX=True,
)
print('Train accuracy:', res['train_accuracy'])
print('Fairness correlations:', res['train_correlations'])
# Predict on held-out data (same schema as training data)
pred = dsldPyQeFairML_Predict(res, svcensus_r)
print('Test correlations:', pred['test_correlations'])
Available Wrappers (selected)
- Analytical:
dsldPyLinear,dsldPyLogit(+Summary,Coef,Vcov,Predict),dsldPyML,dsldPyMatchedATE,dsldPyTakeALookAround,dsldPyConfounders,dsldPyCHunting,dsldPyOHunting - Fair ML:
dsldPyFrrm,dsldPyFgrrm,dsldPyNclm,dsldPyZlm,dsldPyZlrm,dsldPyQeFairKNN,dsldPyQeFairRF,dsldPyQeFairRidgeLin,dsldPyQeFairRidgeLog,dsldPyFairML_Predict,dsldPyFairUtils,dsldPyQeFairML_Predict - Graphical:
dsldPyFreqPCoord,dsldPyScatterPlot3D,dsldPyConditDisparity,dsldPyDensitybyS,dsldPyFrequencybyS,dsldPyIamb
Function names mirror the R package; arguments are standard Python types (pandas DataFrames, dicts, bools, etc.). Internally, rpy2 converts to/from R objects.
Examples
Jupyter notebooks are available in this repository:
inst/examples/graphical.ipynbinst/examples/tabular.ipynbinst/examples/machine_learning.ipynb
These demonstrate end-to-end workflows for estimation and fair ML, including parameter tuning with dsldPyFairUtils.
Troubleshooting
- rpy2 cannot find R: confirm
R RHOMEworks; if not, add R to PATH or setR_HOME. See rpy2 docs for your OS. - dsld not installed in R: run
install.packages("dsld")in an R session. - Dataset conversions: wrappers accept either pandas DataFrames or R data.frames; if needed, see utilities in
dsldPy.Utilsfor explicit conversions.
Authors
- Norm Matloff
- Aditya Mittal
- Taha Abdullah
- Arjun Ashok
- Shubhada Martha
- Billy Ouattara
- Jonathan Tran
- Brandon Zarate
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsldpy-0.0.1.tar.gz.
File metadata
- Download URL: dsldpy-0.0.1.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d38d78f045a4772db31e7fc11e5d1e728d293269ff9a2f522bccf048c48a41d
|
|
| MD5 |
319bed961848d5dcfc22516e4151d917
|
|
| BLAKE2b-256 |
4f58b64306827fc3b0b389f750e51cb09ca617b74ec6867c4ec3aa6545c947bc
|
File details
Details for the file dsldpy-0.0.1-py3-none-any.whl.
File metadata
- Download URL: dsldpy-0.0.1-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8870fad407b66c3bc99555f0ea2bbb62c79997a7115672dcd93831200844b9b5
|
|
| MD5 |
ee54b25eecb35e748fb50aea2b0f02e9
|
|
| BLAKE2b-256 |
11ba25da190ed8419ac9fb2e8a11f4ceda17f4c62a84eb6f910e611d0b8589fc
|