pyqreg

Fast implementation of the quantile regression with support for iid, robust, and cluster standard errors.

These details have not been verified by PyPI

Project links

Homepage

Project description

CircleCI PyPI PyPI - License PythonVersion Black

Pyqreg

Pyqreg implements the quantile regression algorithm with fast estimation method using the linear programming interior point method following the preprocessing procedure in Portnoy and Koenker (1997). It provides methods for estimating the asymptotic covariance matrix for i.i.d and heteroskedastic errors, as well as clustered errors following Parente and Silva (2013).

References

Stephen Portnoy. Roger Koenker. “The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators.” Statist. Sci. 12 (4) 279 - 300 (1997).
Koenker, R., Ng, P. A Frisch-Newton Algorithm for Sparse Quantile Regression. Acta Mathematicae Applicatae Sinica, English Series 21, 225–236 (2005).
Parente, Paulo and Santos Silva, João, (2013), Quantile regression with clustered data, No 1305, Discussion Papers, University of Exeter, Department of Economics.

Install

pyqreg pre-install requirements are:

Python >= 3.6
Numpy

You can install the latest release with:

pip3 install pyqreg

Example

We replicate the study in statsmodels quantile regression that looks at the relationship between income and expenditures on food for a sample of working class Belgian households in 1857 (the Engel data) using pyqreg.

import statsmodels.api as sm

data = sm.datasets.engel.load_pandas().data
data.head()

https://github.com/mozjay0619/pyqreg/blob/master/media/img1.png

Let’s first fit the quantile regression model using the statsmodels.

import statsmodels.formula.api as smf
mod = smf.quantreg("foodexp ~ income", data)
res = mod.fit(q=0.5)
res.summary()

https://github.com/mozjay0619/pyqreg/blob/master/media/img6.png

Now instead of using statsmodels quantile regression, we use that of pyqreg. Observe that the results are identical to that of statsmodels.

from pyqreg import quantreg
mod = quantreg("foodexp ~ income", data)
res = mod.fit(q=0.5)
res.summary()

https://github.com/mozjay0619/pyqreg/blob/master/media/img2.png

The remaining parts of the study can be found in this notebook.

Speed comparisons

Despite the identical regression results and similar APIs, pyqreg uses completely different optimization algorithm under the hood, making it anywhere between 10 to 30 times faster than the statsmodels quantile regression, depending on the data size, error distribution and quantile.

https://github.com/mozjay0619/pyqreg/blob/master/media/img5.png

The above plots the median time to convergence for each data size, which shows a large and growing difference in absolute speed with data size.

Cluster robust standard error

Unlike the statsmodels quantile regression, which only supports iid and heteroskedasticity robust standard errors, pyqreg also supports the cluster robust standard error estimation.

from pyqreg.utils import generate_clustered_data, rng_generator

pyqreg_params = []
pyqreg_ses = []

statsmodels_params = []
statsmodels_ses = []

for i in range(500):

    rng = rng_generator(i)

    # Generate fake clustered data, with 150 groups,
    # 500 data points in each group, using 15
    # as cross cluster variance (normal distribution).
    y, X, groups = generate_clustered_data(150, 500, 15, rng)

    from pyqreg import QuantReg
    mod = QuantReg(y, X)
    res = mod.fit(0.5, cov_type='cluster', cov_kwds={'groups': groups})

    pyqreg_params.append(res.params)
    pyqreg_ses.append(res.bse)

    from statsmodels.regression.quantile_regression import QuantReg
    mod = QuantReg(y, X)
    res = mod.fit(0.5)

    statsmodels_params.append(res.params)
    statsmodels_ses.append(res.bse)

The above code runs a simulation study, using fake generated clustered data. We will take a look at the simulated standard deviation of betas, and the two models’ estimated standard errors.

print(np.asarray(statsmodels_params).std(axis=0))
print(np.asarray(pyqreg_params).std(axis=0))

[1.81944934 2.52755859]
[1.81947597 2.52758232]

As expected, the standard deviation of the estimated betas of the two models are very similar to each other. However, we see a huge divergence in the estimations in standard errors. The heteroskedasticity robust standard error completely underestimates the standard deviation, where as pyqreg produces an estimate that is asymptotically much more accurate:

print(np.asarray(statsmodels_ses).mean(axis=0))
print(np.asarray(pyqreg_ses).mean(axis=0))

[0.14290666 0.20251073]
[1.75910926 2.49862904]

But of course, if we run the same simulation with 0 cross cluster variance, both models’ standard errors are consistent, which makes sense since all the off-diagonal terms in the covariance matrix will be close to 0, making the block diagonal matrix look more like the heteroskedasticity robust (or even iid) covariance diagonal matrix:

print(np.asarray(statsmodels_params).std(axis=0))
print(np.asarray(pyqreg_params).std(axis=0))

[0.09985114 0.14226425]
[0.09984286 0.14225007]

This time, both models produce the accurate standard errors:

print(np.asarray(statsmodels_ses).mean(axis=0))
print(np.asarray(pyqreg_ses).mean(axis=0))

[0.103299   0.14637724]
[0.10282833 0.14554498]

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.2

Mar 23, 2022

0.4.1

Dec 10, 2021

0.3.9

Dec 9, 2021

0.3.8

Oct 28, 2021

0.3.7

Oct 28, 2021

0.3.6

Oct 28, 2021

0.3.5

Oct 21, 2021

0.3.4

Sep 24, 2021

0.3.3

Sep 10, 2021

0.3.2

Sep 9, 2021

0.3.1

Sep 9, 2021

0.3.0

Sep 9, 2021

0.2.0

Sep 9, 2021

0.2b1 pre-release

Sep 9, 2021

0.1.0

Sep 8, 2021

0.0b13 pre-release

Sep 8, 2021

0.0b11 pre-release

Sep 7, 2021

0.0b10 pre-release

Sep 7, 2021

0.0b8 pre-release

Sep 7, 2021

0.0b4 pre-release

Sep 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyqreg-0.4.2.tar.gz (247.0 kB view details)

Uploaded Mar 23, 2022 Source

File details

Details for the file pyqreg-0.4.2.tar.gz.

File metadata

Download URL: pyqreg-0.4.2.tar.gz
Upload date: Mar 23, 2022
Size: 247.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.15

File hashes

Hashes for pyqreg-0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`d979d9d317a8631791a437bf6326c8aba286687ceea326edfa9373dc6c736eeb`
MD5	`904e9b264c5cf8df5d69bbdb5dbb0cf0`
BLAKE2b-256	`e8f96b1072b3d599950318e9c380c6ed31bdb15da148fdb8e80cc76098a62c04`

See more details on using hashes here.

pyqreg 0.4.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pyqreg

References

Install

Example

Speed comparisons

Cluster robust standard error

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes