Skip to main content

Python Package for tuning-free Huber Regression

Project description

tfHuber

Python implementation of Tuning-Free Huber Estimation and Regression

Description

This package implements the Huber mean estimator, Huber covariance matrix estimation and adaptive Huber regression estimators efficiently. For all these methods, the robustification parameter τ is calibrated via a tuning-free principle.

Specifically, for Huber regression, assume the observed data vectors (Y, X) follow a linear model Y = θ0 + X θ + ε, where Y is an n-dimensional response vector, X is an n × d design matrix, and ε is an n-vector of noise variables whose distributions can be asymmetric and/or heavy-tailed. The package computes the standard Huber's M-estimator when d < n and the Huber-Lasso estimator when d > n. The vector of coefficients θ and the intercept term θ0 are estimated successively via a two-step procedure. See Wang et al., 2020 for more details of the two-step tuning-free framework.

Requirement

numpy
setuptools
wheel

Functions

There are four functions in this package:

  • mean(X, grad=True, tol=1e-5, max_iter=500): Huber mean estimation. Return a tuple of mean, $\tau$ and the number of iteration.
    X: A 1-d array.
    grad: Using gradient descent or weighed least square to optimize the mean, default True
    tol: Tolerance of the error, default 1e-5.
    max_iter: Maximum times of iteration, default 500.
  • cov(X, type="element", pairwise=False, tol=1e-5, max_iter=500): Huber covariance matrix estimation. Return a 2d covariance matrix.
    X: A 2-d array.
    type: If set to "element", apply adaptive huber M-estimation; or if set to "spectrum", apply spectrum-wise truncated estimation. Default "element"
    pairwise: Pairwise covariance or difference based covariance. Default false.
    tol: Tolerance of the error, default 1e-5.
    max_iter: Maximum times of iteration, default 500.
  • one_step_reg(X, Y, grad=True, tol=1e-5, max_iter=500 two_step_reg(X, Y, grad=True, tol=1e-5, constTau=1.345, max_iter=500)
    One or two step adaptive Huber regression. Return a tuple of coefficients, $\tau$ and the number of iteration.
    X, Y: Arrays of data.
    grad: Using gradient descent or weighed least square to optimize the mean, default True.
    tol: Tolerance of the error, default 1e-5.
    constTau: Default 1.345. Used only in two-step method.
    max_iter: Maximum times of iteration, default 500.
  • cvlasso(X, Y, lSeq=0, nlambda=30, constTau=2.5, phi0=0.001, gamma=1.5, tol=0.001, nfolds=3): K-fold cross validated Huber-lasso regression. Return a tuple of coefficients, $tau$, the number of iteration and minimun of $\lambda$.
    X, Y: Arrays of data.
    lSeq: A list of Lasso parameter $\lambda$. If not set, automatically find a range of $\lambda$ to be cross validated.
    nlambda: The number of $\lambda$ used for validation.
    constTau, phi0, gamma: Some parameters.
    tol: Tolerance of the error, default 0.001.
    nfolds: Number of folds to be cross validated.

Examples

We present an example of adaptive Huber methods. Here we generate data from a linear model Y = X θ + ε, where ε follows a normal distribution, and estimate the intercept and coefficients by tuning-free Huber regression.

import numpy
import tfhuber
X = np.random.uniform(-1.5, 1.5, (10000, 10))
Y = intercept + np.dot(X, beta) + np.random.normal(0, 1, 10000)

mu, tau, iteration = tf.mean(Y, grad=True, tol=1e-5, max_iter=500)
cov = tf.cov(X, method=1, tol=1e-5, max_iter=500)

theta, tau, iteration = tf.one_step_reg(X, Y, grad=True, tol=1e-5, max_iter=500)
theta, tau, iteration = tf.two_step_reg(X, Y, grad=True, tol=1e-5, consTau=1.345, max_iter=500)

theta, tau, iteration, lam = tf.cvlasso(X, Y) 

Simulation result can be viewed in this colab notebook.

License

GPL (>= 3)

Author(s)

Yifan Dai yifandai@yeah.net, Qiang Sun qsun.ustc@gmail.com

Description and algorithms refer to Xiaoou Pan's page.

References

Guennebaud, G. and Jacob B. and others. (2010). Eigen v3. Website

Ke, Y., Minsker, S., Ren, Z., Sun, Q. and Zhou, W.-X. (2019). User-friendly covariance estimation for heavy-tailed distributions. Statis. Sci. 34 454-471, Paper

Pan, X., Sun, Q. and Zhou, W.-X. (2019). Nonconvex regularized robust regression with oracle properties in polynomial time. Preprint. Paper

Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper

Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Stat. Assoc. 115 254-265. Paper

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. Paper

Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2020). A new principle for tuning-free Huber regression. Stat. Sinica to appear. Paper

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfHuber-0.1.1.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

tfHuber-0.1.1-cp39-cp39-win_amd64.whl (116.0 kB view details)

Uploaded CPython 3.9Windows x86-64

File details

Details for the file tfHuber-0.1.1.tar.gz.

File metadata

  • Download URL: tfHuber-0.1.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for tfHuber-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1b86e9397434af00a43fa5ebb791a64dd00d1e6a458acf6ae180ad78cdfb5c35
MD5 ff845e2d093da4b50035952f054ee37f
BLAKE2b-256 cc407fac822edcb4b6462f8136382013cbce2f3fc82ab724032e84a6cad71a00

See more details on using hashes here.

File details

Details for the file tfHuber-0.1.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tfHuber-0.1.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 116.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for tfHuber-0.1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 36614308bde736b13f77cb21a036a829b4529608dc512736a88e711c32ca3b84
MD5 05990f4277df61d13048db0fa889b2ef
BLAKE2b-256 5bdb4ce98216f5e1bb17c34e0821cb95fef0f0aa222b4882a6a6e00e65f29b18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page