RegLabels
Project description
ValidMLInference
This repository hosts the code for the ValidMLInference package, implementing bias corrction methods described in Battaglia, Christensen, Hansen & Sacher (2024). The two core functions are:
ols_bca
This procedure first computes the standard OLS estimator on a design matrix (Xhat), the first column of which contains AI/ML-generated binary labels, and then applies an additive correction based on an estimate (fpr) of the false-positive rate computed externally. The method also adjusts the variance estimator with a finite-sample correction term to account for the uncertainty in the bias estimation.
Parameters
----------
Y : array_like, shape (n,)
Response variable vector.
Xhat : array_like, shape (n, d)
Design matrix, the first column of which contains the AI/ML-generated binary covariates.
fpr : float
False positive rate of misclassification, used to correct the OLS estimates.
m : int or float
Size of the external sample used to estimate the classifier's false-positive rate. Can be set to 'inf' when the false-positive rate is known exactly.
Returns
-------
b : ndarray, shape (d,)
Bias-corrected regression coefficient estimates.
V : ndarray, shape (d, d)
Adjusted variance-covariance matrix for the bias-corrected estimator.
one_step_unlabeled
This method jointly estimates the upstream (measurement) and downstream (regression) models using only the unlabeled likelihood. Leveraging JAX for automatic differentiation and optimization, it minimizes the negative log-likelihood to obtain the regression coefficients. The variance is then approximated via the inverse Hessian at the optimum.
Parameters
----------
Y : array_like, shape (n,)
Response variable vector.
Xhat : array_like, shape (n, d)
Design matrix constructed from AI/ML-generated regressors.
homoskedastic : bool, optional (default: False)
If True, assumes a common error variance; otherwise, separate error variances are estimated.
distribution : allows to specify the distribution of error terms, optional. By default, it's Normal(0,1).
A custom distribution can be passed down as any jax-compatible PDF function that takes inputs (x, loc, scale).
Returns
-------
b : ndarray, shape (d,)
Estimated regression coefficients extracted from the optimized parameter vector.
V : ndarray, shape (d, d)
Estimated variance-covariance matrix for the regression coefficients, computed as the inverse
of the Hessian of the objective function.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file validmlinference-0.0.7.tar.gz.
File metadata
- Download URL: validmlinference-0.0.7.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d53d6a32b99b5d695af49db097c63be4a67e929cb07faa0cab8c52aefcdbfa2a
|
|
| MD5 |
796641d242c96293e2395b3d0f631c1f
|
|
| BLAKE2b-256 |
b99e3b516da8226b303c2001788d5950e331df52387ad5058ec2e6aa9addc350
|
File details
Details for the file validmlinference-0.0.7-py3-none-any.whl.
File metadata
- Download URL: validmlinference-0.0.7-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
459f2ae82457c0acd0186fea8d528710fcbb5b1c954a26cbc719c32b15e2e071
|
|
| MD5 |
12e82650e14fe58abbe2a93d8b5dbc36
|
|
| BLAKE2b-256 |
ca051158818aee382ea40e27303bbdce5ff97c4d74eb5fd5d37c70e81409c434
|