quantile-regression-pdlp

Quantile regression via linear programming using Google OR-Tools PDLP, with a scikit-learn compatible API and statistical summaries.

These details have not been verified by PyPI

Project links

Project description

TL;DR: quantile-regression-pdlp is an optimization-based quantile regression implementation built on Google OR-Tools' PDLP solver. It provides a familiar scikit-learn-style API and optional statistical summaries via bootstrapping.

Quantile Regression PDLP

A Python package for performing quantile regression using the PDLP solver from Google's OR-Tools, with an interface and summaries similar to statsmodels, and fully compatible with scikit-learn and pandas.

Installation
Usage
Quantile Regression as a Linear Programming Problem
- Mathematical Formulation
- LP Formulation
Features
Documentation
- Class: QuantileRegression
Examples
Dependencies
Contributing
License

Installation

Prerequisites

Python 3.9 or higher.
ortools package.

Install from PyPI

pip install quantile-regression-pdlp

Install from Source

Clone the repository:

git clone https://github.com/joshvern/quantile_regression_pdlp.git
cd quantile_regression_pdlp
pip install -U pip
pip install .

Why PDLP?

Quantile regression can be written as a linear program (LP) by introducing slack variables for positive/negative residuals.
OR-Tools' PDLP is a first-order method aimed at solving large LPs efficiently compared to many traditional simplex/interior-point approaches.

Usage

from quantile_regression_pdlp import QuantileRegression
import numpy as np


# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 2)
y = X @ np.array([1.5, -2.0]) + np.random.randn(100) * 0.5


# Initialize and fit the model
model = QuantileRegression(tau=0.5, n_bootstrap=500, random_state=42)
model.fit(X, y)


# Print the summary
print(model.summary())


# Make predictions
X_new = np.array([[0.1, 0.2], [0.5, 0.8]])
y_pred = model.predict(X_new)
print('Predictions:', y_pred)

Quantile Regression as a Linear Programming Problem

Quantile regression aims to estimate the conditional quantiles of a response variable given certain predictor variables. Unlike ordinary least squares regression, which minimizes the sum of squared residuals, quantile regression minimizes a weighted sum of absolute residuals and can incorporate regularization. Estimating multiple quantiles simultaneously allows for a more comprehensive understanding of the conditional distribution and can enforce non-crossing constraints on the training data.

Mathematical Formulation

For multiple quantiles $\tau_1 < \tau_2 < \dots < \tau_K$ and multiple outputs $M$, the weighted quantile regression problem can be formulated as:

$$ \min_{\beta^{(k, m)}} \sum_{k=1}^{K} \sum_{m=1}^{M} \sum_{i=1}^{n} w_i \rho_{\tau_k}(y_i^{(m)} - x_i^{\top} \beta^{(k, m)}) $$

where:

$y_i^{(m)}$ is the response variable for the $m^{th}$ output.
$x_i$ is the vector of predictor variables.
$w_i$ is the weight assigned to the $i$-th observation.
$\beta^{(k, m)}$ is the vector of coefficients for quantile $\tau_k$ and output $m$.
$\rho_{\tau_k}(u)$ is the quantile loss function defined as:

$$ \rho_{\tau_k}(u) = u(\tau_k - \mathbb{I}(u < 0)) $$

To prevent quantile crossing, we impose additional constraints ensuring that the predicted quantile levels are ordered appropriately for each output:

$$ x_i^{\top} \beta^{(1, m)} \leq x_i^{\top} \beta^{(2, m)} \leq \dots \leq x_i^{\top} \beta^{(K, m)}, \quad \forall i = 1, \dots, n; \quad \forall m = 1, \dots, M $$

LP Formulation

By introducing auxiliary variables and incorporating regularization, the problem can be rewritten:

Variables

For each quantile $\tau_k$ and output $m$, introduce:
- Coefficients $\beta^{(k, m)}$.
- Non-negative slack variables $r_{i}^{+(k, m)}$ and $r_{i}^{-(k, m)}$ for each observation:

$$ y_i^{(m)} - x_i^{\top} \beta^{(k, m)} = r_{i}^{+(k, m)} - r_{i}^{-(k, m)}, \quad \forall i, \forall k, \forall m $$

For L1 regularization, introduce auxiliary variables $z_j^{(k, m)}$ for each coefficient $\beta_j^{(k, m)}$ (excluding the intercept):

$$ z_j^{(k, m)} \geq \beta_j^{(k, m)} \ z_j^{(k, m)} \geq -\beta_j^{(k, m)}, \quad \forall j, \forall k, \forall m $$

Objective Function

The objective function incorporates both the weighted quantile loss for all quantiles and outputs, and the L1 regularization term:

$$ \min_{\beta^{(k, m)}, r^{+(k, m)}, r^{-(k, m)}, z^{(k, m)}} \sum_{k=1}^{K} \sum_{m=1}^{M} \left( \sum_{i=1}^{n} \left( \tau_k w_i r_{i}^{+(k, m)} + (1 - \tau_k) w_i r_{i}^{-(k, m)} \right) + \lambda \sum_{j=1}^{p} z_j^{(k, m)} \right) $$

where:

$\lambda$ is the regularization strength parameter.
$p$ is the number of predictor variables (excluding the intercept).

Constraints

Subject to the constraints:

Residual Constraints:

$$ y_i^{(m)} - x_i^{\top} \beta^{(k, m)} = r_{i}^{+(k, m)} - r_{i}^{-(k, m)}, \quad \forall i = 1, \dots, n; \quad \forall k = 1, \dots, K; \quad \forall m = 1, \dots, M $$

Non-negativity Constraints:

$$ r_{i}^{+(k, m)} \geq 0, \quad r_{i}^{-(k, m)} \geq 0, \quad \forall i = 1, \dots, n; \quad \forall k = 1, \dots, K; \quad \forall m = 1, \dots, M $$

L1 Regularization Constraints:

$$ z_j^{(k, m)} \geq \beta_j^{(k, m)} \ z_j^{(k, m)} \geq -\beta_j^{(k, m)}, \quad \forall j = 1, \dots, p; \quad \forall k = 1, \dots, K; \quad \forall m = 1, \dots, M $$

Non-Crossing Constraints:

For each output $m$:

$$ x_i^{\top} \beta^{(k, m)} \leq x_i^{\top} \beta^{(k+1, m)}, \quad \forall i = 1, \dots, n; \quad \forall k = 1, \dots, K-1; \quad \forall m = 1, \dots, M $$

This LP formulation can be efficiently solved using the PDLP solver provided by Google's OR-Tools.

Features

Sklearn and Pandas Compliant: Seamlessly integrates with scikit-learn pipelines and accepts pandas DataFrame inputs.
Custom Quantiles: Supports estimation for any quantile $\tau \in (0, 1)$.
Multiple Quantiles: Estimates multiple quantiles simultaneously with non-crossing constraints enforced on the training data.
Multi-Output Regression: Supports regression tasks with multiple target variables, training them jointly.
Weighted Quantile Regression: Assigns different weights to observations, allowing differential influence on regression estimates.
L1 Regularization (Lasso): Promotes sparsity in the model by penalizing the absolute values of the coefficients.
Statistical Summaries: Provides bootstrap standard errors; reported t-values and p-values use a t reference distribution (approximate).
Bootstrap Estimation: Standard errors are estimated using bootstrap resampling.
Parallel Bootstrapping: Utilizes multiple CPU cores for bootstrapping to speed up computations.
Progress Indicators: Displays progress bars during bootstrapping to inform users of training progress.
Simple API: Designed to be similar to statsmodels and scikit-learn for ease of use.
Efficient Solver: Utilizes the PDLP solver for efficient computation, suitable for large datasets.

Documentation

Class: `QuantileRegression`

Initialization

QuantileRegression(
    tau=0.5,
    n_bootstrap=1000,
    random_state=None,
    regularization='none',
    alpha=0.0,
    n_jobs=1
)

Parameters:
- tau (float or list of floats, default=0.5): The quantile(s) to estimate, each must be between 0 and 1. Can be a single float or a list of floats.
- n_bootstrap (int, default=1000): Number of bootstrap samples for estimating standard errors.
- random_state (int, default=None): Seed for the random number generator.
- regularization (str, default='none'): Type of regularization to apply. Options are 'l1' for Lasso regularization or 'none' for no regularization.
- alpha (float, default=0.0): Regularization strength. Must be a non-negative float. Higher values imply stronger regularization.
- n_jobs (int, default=1): The number of jobs to run in parallel for bootstrapping. -1 means using all processors.

Attributes

coef_ (dict): Estimated coefficients for each quantile and output. Structure: {tau: {output: array of coefficients}}.
intercept_ (dict): Estimated intercept term for each quantile and output. Structure: {tau: {output: float}}.
stderr_ (dict): Standard errors of the coefficients for each quantile and output. Structure: {tau: {output: array of standard errors}}.
tvalues_ (dict): T-statistics of the coefficients for each quantile and output. Structure: {tau: {output: array of t-values}}.
pvalues_ (dict): P-values of the coefficients for each quantile and output. Structure: {tau: {output: array of p-values}}.
feature_names_ (list): List of feature names. If input X is a pandas DataFrame, the column names are used; otherwise, generic names are assigned.
output_names_ (list): List of output names. If input y is a pandas DataFrame, the column names are used; otherwise, generic names are assigned.

Methods

fit(X, y, weights=None): Fit the quantile regression model to the data.
- Parameters:
  - X: array-like of shape (n_samples, n_features). Training data. Can be a NumPy array or a pandas DataFrame.
  - y: array-like of shape (n_samples,) or (n_samples, n_outputs). Target values. Can be a NumPy array, pandas Series, or pandas DataFrame.
  - weights (ndarray, optional): Weights for each observation, shape (n_samples,). Default is None, which assigns equal weight to all observations.
- Returns: self
predict(X): Predict using the quantile regression model.
- Parameters:
  - X: array-like of shape (n_samples, n_features). Samples. Can be a NumPy array or a pandas DataFrame.
- Returns: y_pred (dict of dicts of ndarrays). Predicted values for each quantile and output. Structure: {tau: {output: array of predictions}}
summary(): Return a summary of the regression results.
- Returns: summary_dict (dict of dicts of pandas DataFrames). Summary tables for each quantile and output with coefficients, standard errors, t-values, and p-values. Structure: {tau: {output: DataFrame}}

Examples

Example 1: Basic Usage

from quantile_regression_pdlp import QuantileRegression
import numpy as np

# Data
np.random.seed(0)
X = np.random.rand(50, 1)
y = 2 * X.squeeze() + np.random.randn(50) * 0.5

# Model
model = QuantileRegression(tau=0.5, n_bootstrap=500, random_state=0)
model.fit(X, y)
print(model.summary()[0.5])

Output

Bootstrapping: 100%|████████████████████████████| 500/500 [00:02<00:00, 237.43it/s]
{'y':            Coefficient  Std. Error   t-value     P>|t|
Intercept     0.053282    0.274354  0.194209  0.846815
X1            1.718165    0.372070  4.617858  0.000028}

Example 2: Estimating Multiple Quantiles Simultaneously

from quantile_regression_pdlp import QuantileRegression
import numpy as np

# Data
np.random.seed(0)
X = np.random.rand(50, 1)
y = np.random.rand(50) * 2 + np.random.randn(50) * 0.5

# Model
quantiles = [0.25, 0.5, 0.75]
model = QuantileRegression(tau=quantiles, n_bootstrap=500, random_state=0)
model.fit(X, y)

# Accessing summaries
summaries = model.summary()
for tau, summary_df in summaries.items():
    print(f"\nQuantile: {tau}")
    print(summary_df)

# Making predictions
X_new = np.array([[0.1], [0.5], [0.9]])
predictions = model.predict(X_new)
for tau, y_pred in predictions.items():
    print(f"\nPredictions for quantile {tau}: {y_pred}")

Output

Bootstrapping: 100%|█████████████████████████████| 500/500 [00:09<00:00, 53.90it/s]

Quantile: 0.25
{'y':            Coefficient  Std. Error   t-value     P>|t|
Intercept     0.989803    0.232086  4.264809  0.000091
X1           -0.827212    0.359381 -2.301768  0.025639}

Quantile: 0.5
{'y':            Coefficient  Std. Error   t-value     P>|t|
Intercept     1.000221    0.231796  4.315086  0.000077
X1           -0.054280    0.531496 -0.102128  0.919072}

Quantile: 0.75
{'y':            Coefficient  Std. Error   t-value     P>|t|
Intercept     1.612909    0.383085  4.210315  0.000109
X1           -0.163154    0.670333 -0.243393  0.808717}

Predictions for quantile 0.25: {'y': array([0.90708228, 0.57619763, 0.24531299])}

Predictions for quantile 0.5: {'y': array([0.99479299, 0.97308079, 0.95136859])}

Predictions for quantile 0.75: {'y': array([1.59659361, 1.53133192, 1.46607023])}

Example 3: Multi-Output Quantile Regression with Weighted Regression and L1 Regularization

from quantile_regression_pdlp import QuantileRegression
import pandas as pd
import numpy as np

# Generate sample data
np.random.seed(42)
n_samples = 200
X = pd.DataFrame({
    'Feature1': np.random.rand(n_samples),
    'Feature2': np.random.rand(n_samples),
    'Feature3': np.random.rand(n_samples)
})

# Generate two target variables
y1 = 1.0 + 2.5 * X['Feature1'] - 1.5 * X['Feature2'] + 0.5 * X['Feature3'] + np.random.randn(n_samples) * 0.3
y2 = 2.0 - 1.5 * X['Feature1'] + 2.0 * X['Feature3'] + np.random.randn(n_samples) * 0.4

y = pd.DataFrame({
    'Target1': y1,
    'Target2': y2
})

# Assign weights: higher weights to observations with y1 > median of y1
weights = np.where(y['Target1'] > np.median(y['Target1']), 1.5, 1.0)

# Initialize the model with multiple quantiles and outputs
model = QuantileRegression(
    tau=[0.25, 0.5, 0.75],
    n_bootstrap=500,
    random_state=42,
    regularization='l1',
    alpha=0.1,
    n_jobs=1  
)

# Fit the model with weighted regression
model.fit(X, y, weights=weights)

# Print the summaries for each quantile and output
summaries = model.summary()
for tau, outputs in summaries.items():
    print(f"\nQuantile: {tau}")
    for output, summary_df in outputs.items():
        print(f"\nOutput: {output}")
        print(summary_df)

# Make predictions
X_new = pd.DataFrame({
    'Feature1': [0.2, 0.6],
    'Feature2': [0.3, 0.7],
    'Feature3': [0.4, 0.9]
})
predictions = model.predict(X_new)
for tau, outputs in predictions.items():
    print(f"\nPredictions for Quantile {tau}:")
    for output, y_pred in outputs.items():
        print(f"  {output}: {y_pred}")

Output

Bootstrapping: 100%|█████████████████████████████| 500/500 [03:25<00:00,  2.43it/s]

Quantile: 0.25

Output: Target1
           Coefficient  Std. Error    t-value     P>|t|
Intercept     0.876363    0.091465   9.581398  0.000000
Feature1      2.568977    0.096322  26.670669  0.000000
Feature2     -1.575274    0.111982 -14.067201  0.000000
Feature3      0.338673    0.114226   2.964943  0.003401

Output: Target2
            Coefficient  Std. Error       t-value  P>|t|
Intercept  1.775060e+00    0.137418  1.291719e+01    0.0
Feature1  -1.666380e+00    0.140054 -1.189817e+01    0.0
Feature2  -1.611076e-09    0.112536 -1.431609e-08    1.0
Feature3   2.085970e+00    0.142150  1.467440e+01    0.0

Quantile: 0.5

Output: Target1
           Coefficient  Std. Error    t-value         P>|t|
Intercept     0.933077    0.086188  10.826043  0.000000e+00
Feature1      2.540118    0.100953  25.161316  0.000000e+00
Feature2     -1.526466    0.104012 -14.675813  0.000000e+00
Feature3      0.612945    0.082269   7.450508  2.858602e-12

Output: Target2
           Coefficient  Std. Error    t-value     P>|t|
Intercept     2.027951    0.143698  14.112589  0.000000
Feature1     -1.595173    0.163475  -9.757886  0.000000
Feature2     -0.021603    0.144032  -0.149988  0.880928
Feature3      2.045145    0.134148  15.245390  0.000000

Quantile: 0.75

Output: Target1
           Coefficient  Std. Error    t-value         P>|t|
Intercept     1.243473    0.117297  10.601072  0.000000e+00
Feature1      2.355551    0.099152  23.756937  0.000000e+00
Feature2     -1.647682    0.128211 -12.851339  0.000000e+00
Feature3      0.665442    0.103556   6.425893  9.650365e-10

Output: Target2
           Coefficient  Std. Error    t-value     P>|t|
Intercept     2.386047    0.131412  18.156995  0.000000
Feature1     -1.676041    0.135823 -12.339850  0.000000
Feature2     -0.139892    0.147441  -0.948801  0.343884
Feature3      2.049981    0.138002  14.854732  0.000000

Predictions for Quantile 0.25:
  Target1: [1.05304581 1.61986352]
  Target2: [2.27617155 2.65260442]

Predictions for Quantile 0.5:
  Target1: [1.22833866 1.9402722 ]
  Target2: [2.52049365 2.89635584]

Predictions for Quantile 0.75:
  Target1: [1.48645538 2.10232417]
  Target2: [2.82886323 3.12748038]

Dependencies

ortools (Google's OR-Tools)
numpy
pandas
scipy
tqdm
joblib
scikit-learn

Install dependencies using:

pip install numpy pandas scipy tqdm ortools joblib scikit-learn

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Fork the repository on GitHub.
Create a new branch for your feature or bugfix.
Commit your changes with clear messages.
Submit a pull request.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.2

Apr 9, 2026

0.4.1

Apr 9, 2026

0.4.0

Apr 8, 2026

0.3.0

Apr 8, 2026

0.2.0

Apr 7, 2026

0.1.1

Dec 22, 2025

This version

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantile_regression_pdlp-0.1.0.tar.gz (18.6 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quantile_regression_pdlp-0.1.0-py3-none-any.whl (13.2 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file quantile_regression_pdlp-0.1.0.tar.gz.

File metadata

Download URL: quantile_regression_pdlp-0.1.0.tar.gz
Upload date: Dec 22, 2025
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for quantile_regression_pdlp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a7b827bb3c99566e9af082d3c99fec9f5a6abde0b1b678fca2f649bc96b2c84d`
MD5	`68f2ec10703a9f5661cb1132134942ae`
BLAKE2b-256	`7176133c42c1b296c081c177943043da9975142f431b0c8cf8f296ae5c3bb2e1`

See more details on using hashes here.

File details

Details for the file quantile_regression_pdlp-0.1.0-py3-none-any.whl.

File metadata

Download URL: quantile_regression_pdlp-0.1.0-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for quantile_regression_pdlp-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`513abaa1d627c4de0474dcc6fba6904ad77ce5944d52fe35606c10c3bfeab3c7`
MD5	`e943bb678b723dab8e0a46f8257a4136`
BLAKE2b-256	`6c15b42b022bc40717a867645179ae294be530441b2e5f8da7ff6730ddc99a04`

See more details on using hashes here.

quantile-regression-pdlp 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quantile Regression PDLP

Table of Contents

Installation

Prerequisites

Install from PyPI

Install from Source

Why PDLP?

Usage

Quantile Regression as a Linear Programming Problem

Mathematical Formulation

LP Formulation

Variables

Objective Function

Constraints

Features

Documentation

Class: QuantileRegression

Examples

Example 1: Basic Usage

Example 2: Estimating Multiple Quantiles Simultaneously

Example 3: Multi-Output Quantile Regression with Weighted Regression and L1 Regularization

Dependencies

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Class: `QuantileRegression`