Python Package for tuning-free Huber Regression
Project description
tfHuber
Tuning-Free Huber Estimation and Regression
Description
This package implements the Huber mean estimator, Huber covariance matrix estimation and adaptive Huber regression estimators efficiently. For all these methods, the robustification parameter τ is calibrated via a tuning-free principle.
Specifically, for Huber regression, assume the observed data vectors (Y, X) follow a linear model Y = θ0 + X θ + ε, where Y is an n-dimensional response vector, X is an n × d design matrix, and ε is an n-vector of noise variables whose distributions can be asymmetric and/or heavy-tailed. The package computes the standard Huber's M-estimator when d < n and the Huber-Lasso estimator when d > n. The vector of coefficients θ and the intercept term θ0 are estimated successively via a two-step procedure. See Wang et al., 2020 for more details of the two-step tuning-free framework.
Functions
There are four functions in this package:
mean
: Huber mean estimation.
X: A list object.
grad: Using gradient descent or weighed least square to optimize the mean, default True
tol: Tolerance of the error, default 1e-5.
max_iter: Maximum times of iteration, default 500.cov
: Huber covariance matrix estimation.
X: A list object.
type: If set to"element"
, apply adaptive huber M-estimation; or if set to"spectrum"
, apply spectrum-wise truncated estimation. Default"element"
pairwise: Pairwise covariance or difference based covariance. Default false.
tol: Tolerance of the error, default 1e-5.
max_iter: Maximum times of iteration, default 500.one_step_reg/two_step_reg
: One or two step adaptive Huber regression.
X, Y: List objects.
grad: Using gradient descent or weighed least square to optimize the mean, default True.
tol: Tolerance of the error, default 1e-5.
max_iter: Maximum times of iteration, default 500.
Examples
We present an example of adaptive Huber methods. Here we generate data from a linear model Y = X θ + ε, where ε follows a normal distribution, and estimate the intercept and coefficients by tuning-free Huber regression.
import numpy
import tfhuber
X = np.random.uniform(-1.5, 1.5, (10000, 10)).tolist()
Y = intercept + np.dot(X, beta) + np.random.normal(0, 1, 10000)
Y = Y.tolist()
mu, tau, iteration = tf.mean(Y, grad=True, tol=1e-5, max_iter=500)
cov = tf.cov(X, method=1, tol=1e-5, max_iter=500)
theta, tau, iteration = tf.one_step_reg(X, Y, grad=True, z=0, tol=1e-5, max_iter=500)
theta, tau, iteration = tf.two_step_reg(X, Y, grad=True, tol=1e-5, consTau=1.345, max_iter=500)
License
GPL (>= 3)
Author(s)
Yifan Dai yifandai@yeah.net, Qiang Sun qsun.ustc@gmail.com
References
Guennebaud, G. and Jacob B. and others. (2010). Eigen v3. Website
Ke, Y., Minsker, S., Ren, Z., Sun, Q. and Zhou, W.-X. (2019). User-friendly covariance estimation for heavy-tailed distributions. Statis. Sci. 34 454-471, Paper
Pan, X., Sun, Q. and Zhou, W.-X. (2019). Nonconvex regularized robust regression with oracle properties in polynomial time. Preprint. Paper
Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper
Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Stat. Assoc. 115 254-265. Paper
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. Paper
Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2020). A new principle for tuning-free Huber regression. Stat. Sinica to appear. Paper
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tfHuber-0.0.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06954d0357344014ebf53b17989e814cd9f95e76c6a80c8c7feed2eea626a69a |
|
MD5 | 8964600cc79679e60a06eabe23545f3e |
|
BLAKE2b-256 | f0d7b48dd14aa1c05586e0074fe437bdcddc01233798f6b233456338d4c830fe |