Skip to main content

Non-crossing quantile regression toolkit with joint multi-quantile fitting, inference, conformal calibration, and evaluation. Scikit-learn compatible.

Project description

PyPI Python Versions CI Docs

quantile-regression-pdlp

Non-crossing quantile models with built-in inference, calibration, and evaluation.

A quantile modeling toolkit — not just a quantile regressor. Fits multiple quantiles jointly with monotonicity constraints that guarantee predictions never cross. Wraps the result in inference, conformal calibration, evaluation metrics, and crossing diagnostics.

Scikit-learn compatible. Validated against sklearn, statsmodels, and R's quantreg.

Why Not Just Fit Quantiles Independently?

When you fit quantiles one at a time (as sklearn and statsmodels do), nothing prevents the 90th percentile prediction from falling below the 10th. On real-world data with heavy tails, noise, or many quantile levels, this happens frequently:

n features quantiles Crossing rate (independent) Crossing rate (this package)
500 10 13 30.0% 0%
1,000 10 13 16.5% 0%
2,000 20 13 11.0% 0%
2,000 20 7 4.5% 0%

This package eliminates crossings by construction. The joint formulation also acts as beneficial regularization — achieving equal or better pinball loss than independent fitting.

Full benchmark methodology and results: Benchmarks

What You Get

This is a toolkit, not a single estimator. It covers the workflow from raw quantile regression through calibrated prediction intervals:

Workflow What it does
Joint Quantile Regression Fit multiple quantiles in one call with non-crossing guarantees
Conformalized Quantile Regression Calibrate intervals for finite-sample coverage guarantees
Censored Quantile Regression Handle right- or left-censored (survival) data
Evaluation & Metrics Pinball loss, coverage, interval score, crossing diagnostics
Calibration Diagnostics Coverage by group/bin, nominal vs empirical, sharpness analysis
Crossing Detection & Repair Diagnose and fix crossings from any quantile model

Feature Comparison

Feature This package sklearn statsmodels
Multiple quantiles (joint fit) Yes No No
Non-crossing guarantee Yes No No
Multi-output regression Yes No No
Analytical / kernel / cluster / bootstrap SEs Yes No Partial
L1 / Elastic Net / SCAD / MCP Yes L1 only No
Conformal calibration (CQR) Yes No No
Calibration diagnostics Yes No No
Evaluation metrics suite Yes Partial No
Crossing detection + fix Yes No No
Censored QR Yes No No
Prediction intervals Yes No No
Pseudo R² Yes No Yes
Formula interface Yes No Yes
Sklearn pipeline compatible Yes Yes No

Installation

pip install quantile-regression-pdlp

Optional extras:

pip install quantile-regression-pdlp[all]   # formula interface + plots
pip install quantile-regression-pdlp[plot]   # matplotlib only
pip install quantile-regression-pdlp[formula] # patsy only

Quick Start

import numpy as np
from quantile_regression_pdlp import QuantileRegression

X = np.random.default_rng(0).normal(size=(200, 3))
y = X @ [2.0, -1.5, 0.8] + np.random.default_rng(1).normal(scale=0.5, size=200)

# Fit 3 quantiles jointly — guaranteed non-crossing
model = QuantileRegression(tau=[0.1, 0.5, 0.9], se_method='analytical')
model.fit(X, y)

# Summaries with coefficients, SEs, p-values, and 95% CIs
print(model.summary()[0.5]['y'])

# Prediction intervals (guaranteed monotone: lower < median < upper)
interval = model.predict_interval(X[:5], coverage=0.80)
print(interval['y']['lower'], interval['y']['upper'])

Conformal Calibration

Turn raw quantile predictions into intervals with coverage guarantees:

from quantile_regression_pdlp.conformal import ConformalQuantileRegression

base = QuantileRegression(tau=[0.05, 0.5, 0.95], se_method='analytical')
cqr = ConformalQuantileRegression(base_estimator=base, coverage=0.90)
cqr.fit(X_train, y_train)

intervals = cqr.predict_interval(X_test)
print(cqr.empirical_coverage(X_test, y_test))  # should be >= 0.90

Censored Quantile Regression

For survival data with right- or left-censoring:

from quantile_regression_pdlp import CensoredQuantileRegression

model = CensoredQuantileRegression(tau=0.5, censoring='right', se_method='analytical')
model.fit(X, observed_time, event_indicator=delta)

Evaluate Any Quantile Model

The metrics and diagnostics modules work with predictions from any source — not just this package:

from quantile_regression_pdlp.metrics import quantile_evaluation_report
from quantile_regression_pdlp.postprocess import crossing_summary

# Evaluate predictions from XGBoost, LightGBM, or any other model
report = quantile_evaluation_report(y_true, predictions, taus)
crossings = crossing_summary(predictions, taus)

Regularization

QuantileRegression(tau=0.5, regularization='l1', alpha=0.1)       # Lasso
QuantileRegression(tau=0.5, regularization='elasticnet', alpha=0.1, l1_ratio=0.5)
QuantileRegression(tau=0.5, regularization='scad', alpha=0.3)     # Less bias on large coefficients
QuantileRegression(tau=0.5, regularization='mcp', alpha=0.3)

Inference Options

QuantileRegression(tau=0.5, se_method='analytical')   # Fast asymptotic SEs
QuantileRegression(tau=0.5, se_method='kernel')        # Heteroscedasticity-robust
QuantileRegression(tau=0.5, se_method='bootstrap', n_bootstrap=500)
# Cluster-robust SEs
model.fit(X, y, clusters=group_labels)

Benchmarks

Tested on heavy-tailed heteroscedastic data (Student-t noise, 10-20 features, up to 13 quantiles):

n features quantiles Crossing (this) Crossing (sklearn) Pinball (this) Pinball (sklearn)
500 10 7 0% 11.0% 0.5148 0.5166
500 10 13 0% 30.0% 0.5095 0.5240
1,000 10 13 0% 16.5% 0.5048 0.5071
2,000 20 13 0% 11.0% 0.5599 0.5611

The joint formulation also achieves slightly better pinball loss — the non-crossing constraints act as beneficial regularization.

Speed tradeoff: This package solves a single joint LP with non-crossing constraints, which is slower than fitting each quantile independently. The value is in the guarantee and the richer downstream workflows. For single-quantile fits where speed matters most, sklearn or statsmodels may be more appropriate.

Full results: Benchmarks | Reproduce locally

When to Use This Package

Use this when you need:

  • Multiple quantile predictions that must not cross (production pipelines, interval forecasts)
  • Statistical inference on quantile coefficients (SEs, p-values, confidence intervals)
  • Calibrated prediction intervals (conformal quantile regression)
  • Censored/survival quantile models
  • A complete evaluation workflow for any quantile model's predictions

Use sklearn or statsmodels when:

  • You only need a single quantile (e.g., median regression)
  • Raw speed matters more than crossing guarantees
  • You don't need inference, calibration, or evaluation tooling

Documentation

Full docs: joshvern.github.io/quantile_regression_pdlp

Implementation

Quantile regression is naturally a linear program. This package solves joint multi-quantile LPs with non-crossing constraints using:

  • PDLP — first-order primal-dual solver (default, from Google OR-Tools)
  • GLOP — revised simplex (faster on small/medium problems)
  • HiGHS — via scipy's sparse LP interface (memory-efficient)
QuantileRegression(tau=0.5, solver_backend='GLOP')   # simplex
QuantileRegression(tau=0.5, use_sparse=True)          # scipy sparse

Dependencies

Required: numpy, pandas, scipy, scikit-learn, ortools, tqdm, joblib

Optional: matplotlib (plots), patsy (formulas), statsmodels (benchmarks)

Contributing

Contributions welcome! Open an issue or submit a pull request on GitHub.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantile_regression_pdlp-0.4.2.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantile_regression_pdlp-0.4.2-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file quantile_regression_pdlp-0.4.2.tar.gz.

File metadata

File hashes

Hashes for quantile_regression_pdlp-0.4.2.tar.gz
Algorithm Hash digest
SHA256 f43c2661da36569b33a9f46f5b59c92151164357f3cbde4fcc1858ec769846fd
MD5 357788c5864befce1b502d70e852e8dd
BLAKE2b-256 4dfb10a2db7621e52375aae1252983b56dabb6aa60c6cf147631cb231b670855

See more details on using hashes here.

File details

Details for the file quantile_regression_pdlp-0.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for quantile_regression_pdlp-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4b96d2d0a661b993a32404db328db042f2812ce982803d31ee5b5f9da902deec
MD5 bd81e577fc3d1a964531de52bca186c2
BLAKE2b-256 d77573e85186d14bdc0cb44e843ad41986d0f8993f1c91cbbb1cf74f3a149305

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page