Skip to main content

Quantile-on-Quantile Kernel-Based Regularized Least Squares — A Python implementation of KRLS (Hainmueller & Hazlett, 2014) and QQKRLS (Adebayo et al., 2024) with publication-quality MATLAB-style visualizations.

Project description

QQKRLS — Quantile-on-Quantile Kernel-Based Regularized Least Squares

PyPI version Python License: MIT

A production-grade Python library implementing KRLS (Kernel Regularized Least Squares), Quantile-on-Quantile Regression, and the combined QQKRLS estimator with publication-quality MATLAB-style visualizations and comprehensive diagnostic tests for academic research.


Table of Contents

  1. Overview
  2. Theoretical Background
  3. Installation
  4. Quick Start
  5. API Reference
  6. Complete Workflow Guide
  7. Ported From
  8. References
  9. Author
  10. License

Overview

QQKRLS combines three powerful econometric methodologies into a single, unified Python toolkit:

Component Method Purpose
KRLS Kernel Regularized Least Squares Nonparametric regression using Gaussian kernels with Tikhonov regularization — no linearity or additivity assumptions
QQ Quantile-on-Quantile Regression Distributional analysis examining how quantiles of X affect quantiles of Y
QQKRLS Combined Estimator Nonparametric marginal effects across all quantile pairs (θ, τ), capturing heterogeneous relationships at different distributional locations

Why QQKRLS?

Traditional regression methods assume:

  • Linearity: the effect of X on Y is constant across all values.
  • Homogeneity: the effect is the same at the mean, the tails, and everywhere in between.

QQKRLS relaxes both assumptions simultaneously, allowing researchers to discover:

  • Nonlinear effects that vary across the distribution of X (θ-dimension).
  • Heterogeneous impacts at different quantiles of Y (τ-dimension).
  • Complex interaction patterns invisible to OLS, quantile regression, or standard KRLS alone.

Theoretical Background

1. Kernel Regularized Least Squares (KRLS)

Reference: Hainmueller, J. & Hazlett, C. (2014). Political Analysis, 22(2), 143-168.

KRLS is a machine-learning regression method that finds a function $f$ in a reproducing kernel Hilbert space (RKHS) that minimises the penalised loss:

$$\hat{f} = \arg\min_{f \in \mathcal{H}K} \sum{i=1}^{N} (y_i - f(x_i))^2 + \lambda |f|_{\mathcal{H}_K}^2$$

By the representer theorem, the solution has the form:

$$\hat{f}(x) = \sum_{i=1}^{N} c_i \cdot K(x, x_i)$$

where $K$ is a Gaussian kernel:

$$K(x_i, x_j) = \exp!\left(-\frac{|x_i - x_j|^2}{\sigma}\right)$$

The coefficient vector $c^*$ is obtained via eigendecomposition:

$$c^* = V (\Lambda + \lambda I)^{-1} V^\top y$$

where $K = V \Lambda V^\top$ is the eigendecomposition of the kernel matrix.

Marginal Effects

Pointwise marginal effects for predictor dimension $d$ are computed analytically:

$$\frac{\partial \hat{f}(x_j)}{\partial x_j^{(d)}} = \frac{-2}{\sigma} \sum_{i=1}^{N} c_i \cdot K(x_j, x_i) \cdot (x_i^{(d)} - x_j^{(d)})$$

The average marginal effect is:

$$\overline{\text{ME}}d = \frac{1}{N} \sum{j=1}^{N} \frac{\partial \hat{f}(x_j)}{\partial x_j^{(d)}}$$

Bandwidth & Regularization

Parameter Symbol Selection Method
Bandwidth $\sigma$ Default = number of predictors $D$
Regularization $\lambda$ Leave-One-Out (LOO) cross-validation via golden-section search

The LOO error is computed efficiently using the "shortcut" formula:

$$\text{LOO} = \sum_{i=1}^{N} \left(\frac{c_i}{G^{-1}_{ii}}\right)^2, \quad \text{where } G^{-1} = V (\Lambda + \lambda I)^{-1} V^\top$$

2. Quantile-on-Quantile Regression (QQ)

Reference: Sim, N. & Zhou, H. (2015). Journal of Banking & Finance, 55, 1-12.

Quantile-on-Quantile regression examines how the $\theta$-th quantile of the independent variable $X$ affects the $\tau$-th conditional quantile of the dependent variable $Y$. This creates a two-dimensional mapping $\beta(\theta, \tau)$ that captures distributional heterogeneity.

The QQ approach:

  1. For each $\theta \in (0, 1)$: compute $Q_X(\theta)$ and subset data where $X \leq Q_X(\theta)$.
  2. For each $\tau \in (0, 1)$: estimate the $\tau$-th conditional quantile of $Y$ given the subset.
  3. The coefficient $\beta(\theta, \tau)$ describes the marginal effect at quantile pair $(\theta, \tau)$.

3. QQKRLS: The Combined Estimator

Reference: Adebayo, T.S., Ozkan, O. & Eweade, B.S. (2024). Journal of Cleaner Production, 440, 140832.

QQKRLS synthesises KRLS and QQ regression:

Algorithm:

  1. For each X-quantile $\theta$:

    • Compute the $\theta$-th quantile threshold: $q_\theta = Q_X(\theta)$
    • Subset data: ${(x_i, y_i) : x_i \leq q_\theta}$
    • Fit KRLS on the subset to obtain pointwise marginal effects $\left{\frac{\partial \hat{f}}{partial x}\right}$
  2. For each Y-quantile $\tau$:

    • The coefficient is the $\tau$-th quantile of the marginal effects distribution: $$\beta(\theta, \tau) = Q_\tau!\left(\left{\frac{\partial \hat{f}{[\theta]}(x_j)}{\partial x_j}\right}{j=1}^{n_\theta}\right)$$
  3. Statistical inference via bootstrap:

    • Resample with replacement $B$ times within each subset.
    • Compute bootstrap standard errors, $t$-statistics, and $p$-values.

The result is a heatmap matrix $\beta(\theta, \tau)$ showing how the nonparametric marginal effect varies across both distributions.


Installation

From PyPI

pip install qqkrls

With full dependencies (interactive plots, diagnostics)

pip install qqkrls[full]

From source (development)

git clone https://github.com/merwanroudane/qqkrls.git
cd qqkrls
pip install -e .

Dependencies

Package Minimum Version Purpose
numpy ≥ 1.20 Core numerical computation
pandas ≥ 1.3 Data manipulation and results storage
scipy ≥ 1.7 Kernel distances, statistical tests
matplotlib ≥ 3.5 Publication-quality visualizations

Optional:

Package Purpose
plotly Interactive 3D surface plots
seaborn Enhanced statistical plots
statsmodels Extended diagnostic tests

Quick Start

KRLS — Kernel Regularized Least Squares

import numpy as np
from qqkrls import krls, plot_krls_derivatives, plot_krls_fit

# Generate nonlinear data
np.random.seed(42)
n = 200
X = np.random.randn(n, 2)
y = np.sin(X[:, 0]) + 0.5 * X[:, 1]**2 + np.random.randn(n) * 0.3

# Fit KRLS
fit = krls(X, y, col_names=["x1", "x2"])

# Print summary
fit.summary()

# Visualize marginal effects
plot_krls_derivatives(fit, var_idx=0, title="Marginal Effect of x1")
plot_krls_fit(fit)

QQKRLS — Quantile-on-Quantile KRLS

import numpy as np
from qqkrls import qqkrls, plot_qqkrls_heatmap, plot_qqkrls_3d

# Generate data
np.random.seed(42)
x = np.random.randn(200)
y = 0.5 * np.sin(x) + np.random.randn(200) * 0.3

# Run QQKRLS
result = qqkrls(y, x, n_boot=100)

# Print summary
result.summary()

# Publication-quality heatmap (Adebayo et al. style)
plot_qqkrls_heatmap(result, title="QQKRLS: X → Y", colorscale="paper")

# 3D surface (MATLAB-style)
plot_qqkrls_3d(result, title="QQKRLS Surface", colorscale="jet")

# Export LaTeX table
latex = result.export_latex(caption="QQKRLS Coefficients")
print(latex)

API Reference

Core Estimation

krls(X, y, ...)

Fit Kernel Regularized Least Squares.

from qqkrls import krls

fit = krls(
    X,                     # (N, D) predictor matrix
    y,                     # (N,) outcome vector
    kernel="gaussian",     # Kernel type: 'gaussian', 'linear', 'poly2', 'poly3', 'poly4'
    lambda_=None,          # Regularization parameter (None = LOO cross-validation)
    sigma=None,            # Gaussian kernel bandwidth (None = D)
    derivative=True,       # Compute pointwise marginal effects
    binary=True,           # Compute first differences for binary predictors
    vcov=True,             # Compute variance-covariance matrices
    eigtrunc=None,         # Eigenvalue truncation threshold (0, 1]
    col_names=None,        # List of predictor names
    verbose=1,             # 0=silent, 1=summary, 2+=detailed
)

Returns: KRLSResult with attributes:

Attribute Shape Description
coeffs (N, 1) Solution vector $c^*$
fitted (N, 1) Fitted values $\hat{y}$
X, y (N, D), (N, 1) Original data
K (N, N) Kernel matrix
sigma, lambda_ scalar Bandwidth and regularization parameters
R2, Looe scalar R-squared and LOO error
derivatives (N, D) Pointwise marginal effects
avg_derivatives (1, D) Average marginal effects
var_avg_derivatives (1, D) Variance of average marginal effects
vcov_c (N, N) Variance-covariance of coefficients
vcov_fitted (N, N) Variance-covariance of fitted values

Methods:

  • fit.summary() — Print publication-quality summary.
  • fit.predict(newdata) — Predict for new observations.

qqkrls(y, x, ...)

Run Quantile-on-Quantile Kernel-Based Regularized Least Squares.

from qqkrls import qqkrls

result = qqkrls(
    y,                     # Dependent variable
    x,                     # Independent variable
    y_quantiles=None,      # Quantiles of Y (τ). Default: 0.05, 0.10, ..., 0.95
    x_quantiles=None,      # Quantiles of X (θ). Default: same as y_quantiles
    sigma=None,            # Gaussian kernel bandwidth (None = auto)
    lambda_=None,          # Regularization parameter (None = LOO CV per subset)
    min_obs=15,            # Minimum observations in a subset to fit KRLS
    n_boot=200,            # Bootstrap replications for standard errors
    verbose=True,          # Print progress
)

Returns: QQKRLSResult with attributes:

Attribute Type Description
results DataFrame Long-format table: y_quantile, x_quantile, coefficient, std_error, t_value, p_value, significance
y_quantiles ndarray Y-quantile grid
x_quantiles ndarray X-quantile grid
n_obs int Number of observations

Methods:

  • result.summary() — Print summary statistics.
  • result.to_matrix(value) — Pivot to (τ × θ) matrix.
  • result.significance_matrix(alpha) — Binary significance matrix.
  • result.stars_matrix() — Matrix of significance stars.
  • result.to_dataframe() — Full results DataFrame.
  • result.export_csv(path) — Export to CSV.
  • result.export_latex(...) — Export to LaTeX table.

multi_qqkrls(y, X, ...)

Run QQKRLS for each column of X against y (multi-variable panel).

from qqkrls import multi_qqkrls

results_dict = multi_qqkrls(
    y, X,
    col_names=["GDP", "CO2", "Energy"],
    n_boot=200,
    verbose=True,
)
# Returns: dict mapping {var_name: QQKRLSResult}

Visualization Functions

All plots follow MATLAB-style aesthetics for publication in top academic journals.

Function Description Key Parameters
plot_qqkrls_heatmap(result, ...) Paper-style coefficient heatmap with significance stars colorscale, show_stars, show_values, vmin, vmax
plot_qqkrls_3d(result, ...) 3D surface of QQKRLS coefficients (MATLAB-style) elev, azim, colorscale
plot_qqkrls_contour(result, ...) Filled contour plot of coefficients levels, colorscale
plot_qqkrls_pvalue(result, ...) P-value heatmap with significance regions alpha
plot_qqkrls_panel(results_dict, ...) Multi-variable panel of heatmaps dep_var, colorscale
plot_krls_derivatives(fit, ...) Pointwise marginal effects scatter + density var_idx
plot_krls_fit(fit, ...) Actual vs fitted scatter (45° line)
plot_krls_panel(fit, ...) Panel of marginal-effect scatters for all predictors

Available Colorscales

Name Description
"paper" Red–White–Green diverging (Adebayo et al. 2024 style) — default
"paper_warm" Red–White warm gradient
"jet" Classic MATLAB Jet
"rdylgn" Red–Yellow–Green diverging
"coolwarm" Blue–Red diverging
"viridis" Perceptually uniform
"plasma" Perceptually uniform warm

Diagnostic Functions

Pre-Estimation Diagnostics

from qqkrls import linearity_test_battery, print_diagnostics

# Run battery of tests to justify using KRLS/QQKRLS
diag_df = linearity_test_battery(y, X, col_names=["x1", "x2"])
print_diagnostics(diag_df)

Tests included:

  • Ramsey RESET — Functional form misspecification
  • Breusch-Pagan — Heteroskedasticity
  • Jarque-Bera — Non-normality of residuals
  • BDS — Nonlinear dependence in residuals

Post-Estimation KRLS Diagnostics

from qqkrls import krls_residual_diagnostics, print_krls_diagnostics

diag = krls_residual_diagnostics(fit)
print_krls_diagnostics(diag)

Reports: R², Adjusted R², Effective df, AIC, BIC, Durbin-Watson, MAE, RMSE, Jarque-Bera.

Individual Diagnostic Tests

from qqkrls import bds_test, parameter_stability_test, jarque_bera

# BDS test for nonlinear dependence
bds = bds_test(series, max_dim=6)
# Returns: {'dimensions': [2..6], 'z_stats': [...], 'p_values': [...]}

# Andrews parameter stability test
stab = parameter_stability_test(series, trim=0.15)
# Returns: {'max_f': ..., 'exp_f': ..., 'ave_f': ...}

# Jarque-Bera normality test
jb = jarque_bera(series)
# Returns: {'statistic': ..., 'p_value': ...}

Table & Export Functions

from qqkrls import (
    qqkrls_coefficient_table,
    krls_summary_table,
    descriptive_statistics,
    export_results_csv,
)

# LaTeX coefficient table with significance stars
latex = qqkrls_coefficient_table(result, digits=3, show_stars=True)

# LaTeX table of KRLS average marginal effects
latex = krls_summary_table(fit, digits=4)

# Descriptive statistics table (Mean, Median, Std, Skew, Kurt, JB)
latex = descriptive_statistics(df, caption="Descriptive Statistics")

# CSV export
export_results_csv(result, "qqkrls_results.csv", digits=4)

Complete Workflow Guide

This section demonstrates the full research workflow from data loading to publication-ready output, mirroring the methodology in Adebayo et al. (2024).

Step 1: Data Preparation & Descriptive Statistics

import numpy as np
import pandas as pd
from qqkrls import descriptive_statistics

# Load your data
df = pd.read_csv("your_data.csv")
y = df["dependent_var"].values
X = df[["x1", "x2", "x3"]].values

# Generate descriptive statistics LaTeX table
latex = descriptive_statistics(df[["dependent_var", "x1", "x2", "x3"]])
print(latex)

Step 2: Pre-Estimation Diagnostics

Before applying KRLS/QQKRLS, demonstrate that the data exhibits nonlinearity:

from qqkrls import linearity_test_battery, print_diagnostics

diag = linearity_test_battery(y, X, col_names=["x1", "x2", "x3"])
print_diagnostics(diag)

# If RESET test rejects linearity → KRLS is justified
# If BDS test rejects i.i.d. → nonparametric approach is justified
# If Breusch-Pagan rejects homoskedasticity → quantile approach is justified

Step 3: KRLS Estimation

from qqkrls import krls, plot_krls_derivatives, plot_krls_fit, plot_krls_panel
from qqkrls import krls_residual_diagnostics, print_krls_diagnostics, krls_summary_table

# Fit KRLS (full nonparametric regression)
fit = krls(X, y, col_names=["x1", "x2", "x3"])

# Post-estimation diagnostics
diag = krls_residual_diagnostics(fit)
print_krls_diagnostics(diag)

# Visualize all marginal effects
plot_krls_panel(fit, save_path="krls_marginal_effects.png")

# Fitted vs actual
plot_krls_fit(fit, save_path="krls_fitted_actual.png")

# LaTeX table
print(krls_summary_table(fit))

Step 4: QQKRLS Estimation

from qqkrls import qqkrls, plot_qqkrls_heatmap, plot_qqkrls_3d
from qqkrls import plot_qqkrls_contour, plot_qqkrls_pvalue

# Define quantile grids
taus = np.arange(0.05, 1.0, 0.05)   # 19 quantiles
thetas = np.arange(0.05, 1.0, 0.05)

# Run QQKRLS for a single X variable
result = qqkrls(y, X[:, 0], y_quantiles=taus, x_quantiles=thetas,
                n_boot=200, verbose=True)

# Summary
result.summary()

# Heatmap (paper-style with significance stars)
plot_qqkrls_heatmap(result, title="QQKRLS: x1 → Y",
                    colorscale="paper", show_stars=True,
                    save_path="qqkrls_heatmap_x1.png")

# 3D surface
plot_qqkrls_3d(result, title="QQKRLS Surface: x1 → Y",
               save_path="qqkrls_3d_x1.png")

# Contour plot
plot_qqkrls_contour(result, save_path="qqkrls_contour_x1.png")

# P-value heatmap
plot_qqkrls_pvalue(result, save_path="qqkrls_pvalue_x1.png")

Step 5: Multi-Variable QQKRLS

from qqkrls import multi_qqkrls, plot_qqkrls_panel

# Run QQKRLS for all independent variables
results_dict = multi_qqkrls(
    y, X,
    col_names=["x1", "x2", "x3"],
    n_boot=200,
    verbose=True,
)

# Panel of heatmaps (one per variable)
plot_qqkrls_panel(results_dict, dep_var="Y",
                  save_path="qqkrls_panel.png")

Step 6: Visualization

# Customize any plot
fig, ax = plot_qqkrls_heatmap(
    result,
    title="Custom Title",
    colorscale="paper",
    show_stars=True,
    show_values=True,    # Show coefficient values in cells
    x_label=r"Quantiles of Energy ($\theta$)",
    y_label=r"Quantiles of CO$_2$ ($\tau$)",
    figsize=(14, 10),
    vmin=-0.5,
    vmax=0.5,
    star_fontsize=7,
    dpi=400,
    save_path="custom_heatmap.png",
)

Step 7: Export & Reporting

# CSV export
result.export_csv("results.csv", digits=4)

# LaTeX table with significance stars
latex = result.export_latex(
    value="coefficient",
    caption="QQKRLS Marginal Effects: Energy → CO₂",
    show_stars=True,
)
print(latex)

# Copy directly to your LaTeX document
with open("table_qqkrls.tex", "w") as f:
    f.write(latex)

Ported From

This library is a faithful Python port of:

Source Version Authors
R CRAN KRLS v1.0-0 Hainmueller & Hazlett
R CRAN QuantileOnQuantile v1.0.3 Roudane
Python wqr v1.0.1 Roudane

References

  1. Hainmueller, J. & Hazlett, C. (2014). Kernel Regularized Least Squares. Political Analysis, 22(2), 143-168. doi:10.1093/pan/mpt024

  2. Sim, N. & Zhou, H. (2015). Oil Prices, US Stock Return, and the Dependence Between Their Quantiles. Journal of Banking & Finance, 55, 1-12. doi:10.1016/j.jbankfin.2015.01.013

  3. Adebayo, T.S., Ozkan, O. & Eweade, B.S. (2024). Do energy efficiency R&D investments and ICT promote environmental sustainability in Sweden? A QQKRLS investigation. Journal of Cleaner Production, 440, 140832. doi:10.1016/j.jclepro.2024.140832

  4. Adebayo, T.S., Meo, M.S., Eweade, B.S. & Ozkan, O. (2024). Analyzing the effects of solar energy innovations, digitalization, and economic globalization on environmental quality in the United States. Clean Technologies and Environmental Policy, 26, 4157-4176. doi:10.1007/s10098-024-02831-0


Author

Dr. Merwan Roudane


License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qqkrls-1.1.0.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qqkrls-1.1.0-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file qqkrls-1.1.0.tar.gz.

File metadata

  • Download URL: qqkrls-1.1.0.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for qqkrls-1.1.0.tar.gz
Algorithm Hash digest
SHA256 2cc17eec1148a83890f4f8d970f78609c123dc00226ca381a2e14850bc80549d
MD5 d9e566cc070ad3b2c1b5df424ff5ce83
BLAKE2b-256 8d2a7a4bbfdc7e7a9281571a0d6b6c512aba2de7381ca8528cea44b40c8037b9

See more details on using hashes here.

File details

Details for the file qqkrls-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: qqkrls-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for qqkrls-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e46150fdda2eb6d08d72b6f9967c76f26a92fca5f743a43f454e050a9176185
MD5 48c17029ccd4933c03818133ceda75f1
BLAKE2b-256 43e5ae6aa5088e143fe5b4bb549ab0dd5e52f01c1bf2abbd0dc329ca187b8ede

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page