Skip to main content

Quantile regression via linear programming using Google OR-Tools PDLP, with a scikit-learn compatible API and statistical summaries.

Project description

PyPI Python Versions CI Docs

quantile-regression-pdlp

Optimization-based quantile regression built on Google OR-Tools. Scikit-learn API, statsmodels-style summaries, and features that go beyond what either package offers.

What makes this different from sklearn or statsmodels?

  • Fits multiple quantiles jointly with non-crossing constraints
  • Multi-output regression in a single model
  • SCAD, MCP, and elastic net penalties (not just L1)
  • Analytical, bootstrap, kernel, and cluster-robust standard errors
  • Conformalized quantile regression for calibrated prediction intervals
  • Evaluation metrics: pinball loss, coverage, interval score, crossing diagnostics
  • Calibration diagnostics: coverage by group/bin, nominal vs empirical, sharpness analysis
  • Crossing detection and rearrangement for any quantile model's predictions
  • Prediction intervals, quantile process plots, and pseudo R²
  • Censored quantile regression for survival data
  • Scipy sparse solver for large-scale problems
  • Validated against sklearn, statsmodels, and R's quantreg
Feature This package sklearn statsmodels
Multiple quantiles (joint) Yes No No
Non-crossing constraints Yes No No
Multi-output Yes No No
Analytical SEs Yes No Yes
Kernel (robust) SEs Yes No Yes
Cluster-robust SEs Yes No No
Bootstrap SEs Yes No No
L1 / Elastic Net / SCAD / MCP Yes L1 only No
Conformal calibration (CQR) Yes No No
Evaluation metrics suite Yes Partial No
Crossing detection + fix Yes No No
Calibration diagnostics Yes No No
Prediction intervals Yes No No
Pseudo R² Yes No Yes
Formula interface Yes No Yes
Censored QR Yes No No
Sklearn pipeline compatible Yes Yes No

Installation

pip install quantile-regression-pdlp

Optional extras:

pip install quantile-regression-pdlp[all]   # formula interface + plots
pip install quantile-regression-pdlp[plot]   # matplotlib only
pip install quantile-regression-pdlp[formula] # patsy only

Quick Start

import numpy as np
from quantile_regression_pdlp import QuantileRegression

X = np.random.default_rng(0).normal(size=(200, 3))
y = X @ [2.0, -1.5, 0.8] + np.random.default_rng(1).normal(scale=0.5, size=200)

model = QuantileRegression(tau=[0.1, 0.5, 0.9], n_bootstrap=200, random_state=0)
model.fit(X, y)

# Summaries with coefficients, SEs, p-values, and 95% CIs
print(model.summary()[0.5]['y'])

# Prediction intervals
interval = model.predict_interval(X[:5], coverage=0.80)
print(interval['y']['lower'], interval['y']['upper'])

# Pseudo R²
print(model.pseudo_r_squared_)

Features at a Glance

Regularization

# L1 (Lasso)
QuantileRegression(tau=0.5, regularization='l1', alpha=0.1)

# Elastic net
QuantileRegression(tau=0.5, regularization='elasticnet', alpha=0.1, l1_ratio=0.5)

# SCAD (less bias on large coefficients)
QuantileRegression(tau=0.5, regularization='scad', alpha=0.3)

# MCP
QuantileRegression(tau=0.5, regularization='mcp', alpha=0.3)

Inference Options

# Fast analytical SEs (no bootstrapping needed)
model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit(X, y)

# Heteroscedasticity-robust kernel sandwich SEs
model = QuantileRegression(tau=0.5, se_method='kernel')
model.fit(X, y)

# Cluster-robust SEs
model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit(X, y, clusters=group_labels)

Quantile Process Plot

model = QuantileRegression(
    tau=[0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95],
    se_method='analytical'
)
model.fit(X, y)
model.plot_quantile_process(feature='X1')

Formula Interface

model = QuantileRegression(tau=0.5, se_method='analytical')
model.fit_formula('y ~ x1 + x2 + C(region)', data=df)

Censored Quantile Regression

from quantile_regression_pdlp import CensoredQuantileRegression

model = CensoredQuantileRegression(tau=0.5, censoring='right', se_method='analytical')
model.fit(X, observed_time, event_indicator=delta)

Solver Options

# GLOP simplex (faster on small/medium problems)
QuantileRegression(tau=0.5, solver_backend='GLOP')

# Scipy sparse solver (memory-efficient for large datasets)
QuantileRegression(tau=0.5, use_sparse=True)

# Solver tuning
QuantileRegression(tau=0.5, solver_tol=1e-8, solver_time_limit=60.0)

Benchmarks

Tested on heavy-tailed heteroscedastic data (Student-t noise, 10-20 features, up to 13 quantiles). The key advantage: zero quantile crossings where independent fitters produce 4-30% crossing rates.

n features quantiles Crossing rate (this) Crossing rate (sklearn) Pinball loss (this) Pinball loss (sklearn)
500 10 7 0% 11.0% 0.5148 0.5166
500 10 13 0% 30.0% 0.5095 0.5240
1,000 10 13 0% 16.5% 0.5048 0.5071
2,000 20 13 0% 11.0% 0.5599 0.5611

The joint non-crossing formulation also achieves slightly better pinball loss as the constraints act as beneficial regularization.

Full results and methodology: Benchmarks

pip install -e ".[benchmark]"
python benchmarks/run_linear_baselines.py
python benchmarks/report.py

Documentation

Full docs: joshvern.github.io/quantile_regression_pdlp

Why PDLP?

Quantile regression is naturally a linear program. OR-Tools' PDLP is a first-order solver designed for large-scale LPs, making it efficient for high-dimensional problems. For smaller problems, the package also supports GLOP (simplex) and scipy's HiGHS solver.

Dependencies

Required: ortools, numpy, pandas, scipy, tqdm, joblib, scikit-learn

Optional: matplotlib (plots), patsy (formulas)

Contributing

Contributions welcome! Open an issue or submit a pull request on GitHub.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantile_regression_pdlp-0.4.1.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantile_regression_pdlp-0.4.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file quantile_regression_pdlp-0.4.1.tar.gz.

File metadata

  • Download URL: quantile_regression_pdlp-0.4.1.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for quantile_regression_pdlp-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4b0796ed1cb766855f82cbe2af60d3fabbfbccd17dbbb8fb5b5e71e46411d1aa
MD5 6124adfe77c48b6b7aea1e3057d65e41
BLAKE2b-256 b0ca8fc14d20c87040539cd9cf4d66809ac08882212c92ea4598ae2df4568357

See more details on using hashes here.

File details

Details for the file quantile_regression_pdlp-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for quantile_regression_pdlp-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc9eb8173a5a6ec1b8c388911a2e0d264818a212d23d6b1594f17197a1c3bb30
MD5 d7aa3728760148ea0fdd115be666b16d
BLAKE2b-256 f656e0e3fdc0e2e4b87572eb54d808ae5a8687db3d59cf17114144267107f929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page