Skip to main content

Quantile regression via linear programming using Google OR-Tools PDLP, with a scikit-learn compatible API and statistical summaries.

Project description

PyPI Python Versions CI Docs

quantile-regression-pdlp

Optimization-based quantile regression built on Google OR-Tools. Scikit-learn API, statsmodels-style summaries, and features that go beyond what either package offers.

What makes this different from sklearn or statsmodels?

  • Fits multiple quantiles jointly with non-crossing constraints
  • Multi-output regression in a single model
  • SCAD, MCP, and elastic net penalties (not just L1)
  • Analytical, bootstrap, kernel, and cluster-robust standard errors
  • Conformalized quantile regression for calibrated prediction intervals
  • Evaluation metrics: pinball loss, coverage, interval score, crossing diagnostics
  • Calibration diagnostics: coverage by group/bin, nominal vs empirical, sharpness analysis
  • Crossing detection and rearrangement for any quantile model's predictions
  • Prediction intervals, quantile process plots, and pseudo R²
  • Censored quantile regression for survival data
  • Scipy sparse solver for large-scale problems
  • Validated against sklearn, statsmodels, and R's quantreg
Feature This package sklearn statsmodels
Multiple quantiles (joint) Yes No No
Non-crossing constraints Yes No No
Multi-output Yes No No
Analytical SEs Yes No Yes
Kernel (robust) SEs Yes No Yes
Cluster-robust SEs Yes No No
Bootstrap SEs Yes No No
L1 / Elastic Net / SCAD / MCP Yes L1 only No
Conformal calibration (CQR) Yes No No
Evaluation metrics suite Yes Partial No
Crossing detection + fix Yes No No
Calibration diagnostics Yes No No
Prediction intervals Yes No No
Pseudo R² Yes No Yes
Formula interface Yes No Yes
Censored QR Yes No No
Sklearn pipeline compatible Yes Yes No

Installation

pip install quantile-regression-pdlp

Optional extras:

pip install quantile-regression-pdlp[all]   # formula interface + plots
pip install quantile-regression-pdlp[plot]   # matplotlib only
pip install quantile-regression-pdlp[formula] # patsy only

Quick Start

import numpy as np
from quantile_regression_pdlp import QuantileRegression

X = np.random.default_rng(0).normal(size=(200, 3))
y = X @ [2.0, -1.5, 0.8] + np.random.default_rng(1).normal(scale=0.5, size=200)

model = QuantileRegression(tau=[0.1, 0.5, 0.9], n_bootstrap=200, random_state=0)
model.fit(X, y)

# Summaries with coefficients, SEs, p-values, and 95% CIs
print(model.summary()[0.5]['y'])

# Prediction intervals
interval = model.predict_interval(X[:5], coverage=0.80)
print(interval['y']['lower'], interval['y']['upper'])

# Pseudo R²
print(model.pseudo_r_squared_)

Features at a Glance

Regularization

# L1 (Lasso)
QuantileRegression(tau=0.5, regularization='l1', alpha=0.1)

# Elastic net
QuantileRegression(tau=0.5, regularization='elasticnet', alpha=0.1, l1_ratio=0.5)

# SCAD (less bias on large coefficients)
QuantileRegression(tau=0.5, regularization='scad', alpha=0.3)

# MCP
QuantileRegression(tau=0.5, regularization='mcp', alpha=0.3)

Inference Options

# Fast analytical SEs (no bootstrapping needed)
model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit(X, y)

# Heteroscedasticity-robust kernel sandwich SEs
model = QuantileRegression(tau=0.5, se_method='kernel')
model.fit(X, y)

# Cluster-robust SEs
model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit(X, y, clusters=group_labels)

Quantile Process Plot

model = QuantileRegression(
    tau=[0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95],
    se_method='analytical'
)
model.fit(X, y)
model.plot_quantile_process(feature='X1')

Formula Interface

model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit_formula('y ~ x1 + x2 + C(region)', data=df)

Censored Quantile Regression

from quantile_regression_pdlp import CensoredQuantileRegression

model = CensoredQuantileRegression(tau=0.5, censoring='right', se_method='analytical')
model.fit(X, observed_time, event_indicator=delta)

Solver Options

# GLOP simplex (faster on small/medium problems)
QuantileRegression(tau=0.5, solver_backend='GLOP')

# Scipy sparse solver (memory-efficient for large datasets)
QuantileRegression(tau=0.5, use_sparse=True)

# Solver tuning
QuantileRegression(tau=0.5, solver_tol=1e-8, solver_time_limit=60.0)

Benchmarks

Validated against sklearn QuantileRegressor and statsmodels QuantReg across multiple dataset sizes and quantile configurations. All three solve the same LP, so pinball loss is equivalent — the difference is in what this package provides around the fit.

n quantiles Pinball loss (all equal) Crossing rate This package extras
500 3 0.2291 0% (all) Joint fit, non-crossing guarantee, SEs, CQR
2000 5 0.2238 0% (all) + evaluation metrics, crossing tools
5000 5 0.2347 0% (all) + censored QR, SCAD/MCP/elastic net

Full benchmark results, methodology, and reproduction instructions: Benchmarks docs

# Reproduce benchmarks locally
pip install -e ".[benchmark]"
python benchmarks/run_linear_baselines.py
python benchmarks/report.py

Documentation

Full docs: joshvern.github.io/quantile_regression_pdlp

Why PDLP?

Quantile regression is naturally a linear program. OR-Tools' PDLP is a first-order solver designed for large-scale LPs, making it efficient for high-dimensional problems. For smaller problems, the package also supports GLOP (simplex) and scipy's HiGHS solver.

Dependencies

Required: ortools, numpy, pandas, scipy, tqdm, joblib, scikit-learn

Optional: matplotlib (plots), patsy (formulas)

Contributing

Contributions welcome! Open an issue or submit a pull request on GitHub.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantile_regression_pdlp-0.4.0.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantile_regression_pdlp-0.4.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file quantile_regression_pdlp-0.4.0.tar.gz.

File metadata

  • Download URL: quantile_regression_pdlp-0.4.0.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for quantile_regression_pdlp-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4bccfde3b399d949dfdfc8174053747ef85e2d29205b3860199e45b7ed6cc77a
MD5 cf8a43b2f263c7af5ef95a413245fcc5
BLAKE2b-256 abc807e621f7a91c6e9353008e8e8d31cdb4b5437dac7306f25f018df5d92930

See more details on using hashes here.

File details

Details for the file quantile_regression_pdlp-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for quantile_regression_pdlp-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89de5655926ed0237791ecbec688b50ff0213433af97981f8a6247ac4e5ba413
MD5 01ec8244cc4d932ff67610f8e00a66ef
BLAKE2b-256 aaaff65c33e0288a76bd55377e893f179ac092a4684153d8d9a541f0d8bf2b87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page