Skip to main content

Quantile-on-Quantile Kernel-Based Regularized Least Squares — A Python implementation of KRLS (Hainmueller & Hazlett, 2014) and QQKRLS (Adebayo et al., 2024) with publication-quality MATLAB-style visualizations.

Project description

QQKRLS — Quantile-on-Quantile Kernel-Based Regularized Least Squares

PyPI version Python License: MIT

A production-grade Python library implementing KRLS (Kernel Regularized Least Squares), Quantile-on-Quantile Regression, and the combined QQKRLS estimator with publication-quality MATLAB-style visualizations and comprehensive diagnostic tests for academic research.


Table of Contents

  1. Overview
  2. Theoretical Background
  3. Installation
  4. Quick Start
  5. API Reference
  6. Complete Workflow Guide
  7. Ported From
  8. References
  9. Author
  10. License

Overview

QQKRLS combines three powerful econometric methodologies into a single, unified Python toolkit:

Component Method Purpose
KRLS Kernel Regularized Least Squares Nonparametric regression using Gaussian kernels with Tikhonov regularization — no linearity or additivity assumptions
QQ Quantile-on-Quantile Regression Distributional analysis examining how quantiles of X affect quantiles of Y
QQKRLS Combined Estimator Nonparametric marginal effects across all quantile pairs (θ, τ), capturing heterogeneous relationships at different distributional locations

Why QQKRLS?

Traditional regression methods assume:

  • Linearity: the effect of X on Y is constant across all values.
  • Homogeneity: the effect is the same at the mean, the tails, and everywhere in between.

QQKRLS relaxes both assumptions simultaneously, allowing researchers to discover:

  • Nonlinear effects that vary across the distribution of X (θ-dimension).
  • Heterogeneous impacts at different quantiles of Y (τ-dimension).
  • Complex interaction patterns invisible to OLS, quantile regression, or standard KRLS alone.

Theoretical Background

1. Kernel Regularized Least Squares (KRLS)

Reference: Hainmueller, J. & Hazlett, C. (2014). Political Analysis, 22(2), 143-168.

KRLS is a machine-learning regression method that finds a function $f$ in a reproducing kernel Hilbert space (RKHS) that minimises the penalised loss:

$$\hat{f} = \arg\min_{f \in \mathcal{H}K} \sum{i=1}^{N} (y_i - f(x_i))^2 + \lambda |f|_{\mathcal{H}_K}^2$$

By the representer theorem, the solution has the form:

$$\hat{f}(x) = \sum_{i=1}^{N} c_i \cdot K(x, x_i)$$

where $K$ is a Gaussian kernel:

$$K(x_i, x_j) = \exp!\left(-\frac{|x_i - x_j|^2}{\sigma}\right)$$

The coefficient vector $c^*$ is obtained via eigendecomposition:

$$c^* = V (\Lambda + \lambda I)^{-1} V^\top y$$

where $K = V \Lambda V^\top$ is the eigendecomposition of the kernel matrix.

Marginal Effects

Pointwise marginal effects for predictor dimension $d$ are computed analytically:

$$\frac{\partial \hat{f}(x_j)}{\partial x_j^{(d)}} = \frac{-2}{\sigma} \sum_{i=1}^{N} c_i \cdot K(x_j, x_i) \cdot (x_i^{(d)} - x_j^{(d)})$$

The average marginal effect is:

$$\overline{\text{ME}}d = \frac{1}{N} \sum{j=1}^{N} \frac{\partial \hat{f}(x_j)}{\partial x_j^{(d)}}$$

Bandwidth & Regularization

Parameter Symbol Selection Method
Bandwidth $\sigma$ Default = number of predictors $D$
Regularization $\lambda$ Leave-One-Out (LOO) cross-validation via golden-section search

The LOO error is computed efficiently using the "shortcut" formula:

$$\text{LOO} = \sum_{i=1}^{N} \left(\frac{c_i}{G^{-1}_{ii}}\right)^2, \quad \text{where } G^{-1} = V (\Lambda + \lambda I)^{-1} V^\top$$

2. Quantile-on-Quantile Regression (QQ)

Reference: Sim, N. & Zhou, H. (2015). Journal of Banking & Finance, 55, 1-12.

Quantile-on-Quantile regression examines how the $\theta$-th quantile of the independent variable $X$ affects the $\tau$-th conditional quantile of the dependent variable $Y$. This creates a two-dimensional mapping $\beta(\theta, \tau)$ that captures distributional heterogeneity.

The QQ approach:

  1. For each $\theta \in (0, 1)$: compute $Q_X(\theta)$ and subset data where $X \leq Q_X(\theta)$.
  2. For each $\tau \in (0, 1)$: estimate the $\tau$-th conditional quantile of $Y$ given the subset.
  3. The coefficient $\beta(\theta, \tau)$ describes the marginal effect at quantile pair $(\theta, \tau)$.

3. QQKRLS: The Combined Estimator

Reference: Adebayo, T.S., Ozkan, O. & Eweade, B.S. (2024). Journal of Cleaner Production, 440, 140832.

QQKRLS synthesises KRLS and QQ regression:

Algorithm:

  1. For each X-quantile $\theta$:

    • Compute the $\theta$-th quantile threshold: $q_\theta = Q_X(\theta)$
    • Subset data: ${(x_i, y_i) : x_i \leq q_\theta}$
    • Fit KRLS on the subset to obtain pointwise marginal effects $\left{\frac{\partial \hat{f}}{partial x}\right}$
  2. For each Y-quantile $\tau$:

    • The coefficient is the $\tau$-th quantile of the marginal effects distribution: $$\beta(\theta, \tau) = Q_\tau!\left(\left{\frac{\partial \hat{f}{[\theta]}(x_j)}{\partial x_j}\right}{j=1}^{n_\theta}\right)$$
  3. Statistical inference via bootstrap:

    • Resample with replacement $B$ times within each subset.
    • Compute bootstrap standard errors, $t$-statistics, and $p$-values.

The result is a heatmap matrix $\beta(\theta, \tau)$ showing how the nonparametric marginal effect varies across both distributions.


Installation

From PyPI

pip install qqkrls

With full dependencies (interactive plots, diagnostics)

pip install qqkrls[full]

From source (development)

git clone https://github.com/merwanroudane/qqkrls.git
cd qqkrls
pip install -e .

Dependencies

Package Minimum Version Purpose
numpy ≥ 1.20 Core numerical computation
pandas ≥ 1.3 Data manipulation and results storage
scipy ≥ 1.7 Kernel distances, statistical tests
matplotlib ≥ 3.5 Publication-quality visualizations

Optional:

Package Purpose
plotly Interactive 3D surface plots
seaborn Enhanced statistical plots
statsmodels Extended diagnostic tests

Quick Start

KRLS — Kernel Regularized Least Squares

import numpy as np
from qqkrls import krls, plot_krls_derivatives, plot_krls_fit

# Generate nonlinear data
np.random.seed(42)
n = 200
X = np.random.randn(n, 2)
y = np.sin(X[:, 0]) + 0.5 * X[:, 1]**2 + np.random.randn(n) * 0.3

# Fit KRLS
fit = krls(X, y, col_names=["x1", "x2"])

# Print summary
fit.summary()

# Visualize marginal effects
plot_krls_derivatives(fit, var_idx=0, title="Marginal Effect of x1")
plot_krls_fit(fit)

QQKRLS — Quantile-on-Quantile KRLS

import numpy as np
from qqkrls import qqkrls, plot_qqkrls_heatmap, plot_qqkrls_3d

# Generate data
np.random.seed(42)
x = np.random.randn(200)
y = 0.5 * np.sin(x) + np.random.randn(200) * 0.3

# Run QQKRLS
result = qqkrls(y, x, n_boot=100)

# Print summary
result.summary()

# Publication-quality heatmap (Adebayo et al. style)
plot_qqkrls_heatmap(result, title="QQKRLS: X → Y", colorscale="paper")

# 3D surface (MATLAB-style)
plot_qqkrls_3d(result, title="QQKRLS Surface", colorscale="jet")

# Export LaTeX table
latex = result.export_latex(caption="QQKRLS Coefficients")
print(latex)

API Reference

Core Estimation

krls(X, y, ...)

Fit Kernel Regularized Least Squares.

from qqkrls import krls

fit = krls(
    X,                     # (N, D) predictor matrix
    y,                     # (N,) outcome vector
    kernel="gaussian",     # Kernel type: 'gaussian', 'linear', 'poly2', 'poly3', 'poly4'
    lambda_=None,          # Regularization parameter (None = LOO cross-validation)
    sigma=None,            # Gaussian kernel bandwidth (None = D)
    derivative=True,       # Compute pointwise marginal effects
    binary=True,           # Compute first differences for binary predictors
    vcov=True,             # Compute variance-covariance matrices
    eigtrunc=None,         # Eigenvalue truncation threshold (0, 1]
    col_names=None,        # List of predictor names
    verbose=1,             # 0=silent, 1=summary, 2+=detailed
)

Returns: KRLSResult with attributes:

Attribute Shape Description
coeffs (N, 1) Solution vector $c^*$
fitted (N, 1) Fitted values $\hat{y}$
X, y (N, D), (N, 1) Original data
K (N, N) Kernel matrix
sigma, lambda_ scalar Bandwidth and regularization parameters
R2, Looe scalar R-squared and LOO error
derivatives (N, D) Pointwise marginal effects
avg_derivatives (1, D) Average marginal effects
var_avg_derivatives (1, D) Variance of average marginal effects
vcov_c (N, N) Variance-covariance of coefficients
vcov_fitted (N, N) Variance-covariance of fitted values

Methods:

  • fit.summary() — Print publication-quality summary.
  • fit.predict(newdata) — Predict for new observations.

qqkrls(y, x, ...)

Run Quantile-on-Quantile Kernel-Based Regularized Least Squares.

from qqkrls import qqkrls

result = qqkrls(
    y,                     # Dependent variable
    x,                     # Independent variable
    y_quantiles=None,      # Quantiles of Y (τ). Default: 0.05, 0.10, ..., 0.95
    x_quantiles=None,      # Quantiles of X (θ). Default: same as y_quantiles
    sigma=None,            # Gaussian kernel bandwidth (None = auto)
    lambda_=None,          # Regularization parameter (None = LOO CV per subset)
    min_obs=15,            # Minimum observations in a subset to fit KRLS
    n_boot=200,            # Bootstrap replications for standard errors
    verbose=True,          # Print progress
)

Returns: QQKRLSResult with attributes:

Attribute Type Description
results DataFrame Long-format table: y_quantile, x_quantile, coefficient, std_error, t_value, p_value, significance
y_quantiles ndarray Y-quantile grid
x_quantiles ndarray X-quantile grid
n_obs int Number of observations

Methods:

  • result.summary() — Print summary statistics.
  • result.to_matrix(value) — Pivot to (τ × θ) matrix.
  • result.significance_matrix(alpha) — Binary significance matrix.
  • result.stars_matrix() — Matrix of significance stars.
  • result.to_dataframe() — Full results DataFrame.
  • result.export_csv(path) — Export to CSV.
  • result.export_latex(...) — Export to LaTeX table.

multi_qqkrls(y, X, ...)

Run QQKRLS for each column of X against y (multi-variable panel).

from qqkrls import multi_qqkrls

results_dict = multi_qqkrls(
    y, X,
    col_names=["GDP", "CO2", "Energy"],
    n_boot=200,
    verbose=True,
)
# Returns: dict mapping {var_name: QQKRLSResult}

Visualization Functions

All plots follow MATLAB-style aesthetics for publication in top academic journals.

Function Description Key Parameters
plot_qqkrls_heatmap(result, ...) Paper-style coefficient heatmap with significance stars colorscale, show_stars, show_values, vmin, vmax
plot_qqkrls_3d(result, ...) 3D surface of QQKRLS coefficients (MATLAB-style) elev, azim, colorscale
plot_qqkrls_contour(result, ...) Filled contour plot of coefficients levels, colorscale
plot_qqkrls_pvalue(result, ...) P-value heatmap with significance regions alpha
plot_qqkrls_panel(results_dict, ...) Multi-variable panel of heatmaps dep_var, colorscale
plot_krls_derivatives(fit, ...) Pointwise marginal effects scatter + density var_idx
plot_krls_fit(fit, ...) Actual vs fitted scatter (45° line)
plot_krls_panel(fit, ...) Panel of marginal-effect scatters for all predictors

Available Colorscales

Name Description
"paper" Red–White–Green diverging (Adebayo et al. 2024 style) — default
"paper_warm" Red–White warm gradient
"jet" Classic MATLAB Jet
"rdylgn" Red–Yellow–Green diverging
"coolwarm" Blue–Red diverging
"viridis" Perceptually uniform
"plasma" Perceptually uniform warm

Diagnostic Functions

Pre-Estimation Diagnostics

from qqkrls import linearity_test_battery, print_diagnostics

# Run battery of tests to justify using KRLS/QQKRLS
diag_df = linearity_test_battery(y, X, col_names=["x1", "x2"])
print_diagnostics(diag_df)

Tests included:

  • Ramsey RESET — Functional form misspecification
  • Breusch-Pagan — Heteroskedasticity
  • Jarque-Bera — Non-normality of residuals
  • BDS — Nonlinear dependence in residuals

Post-Estimation KRLS Diagnostics

from qqkrls import krls_residual_diagnostics, print_krls_diagnostics

diag = krls_residual_diagnostics(fit)
print_krls_diagnostics(diag)

Reports: R², Adjusted R², Effective df, AIC, BIC, Durbin-Watson, MAE, RMSE, Jarque-Bera.

Individual Diagnostic Tests

from qqkrls import bds_test, parameter_stability_test, jarque_bera

# BDS test for nonlinear dependence
bds = bds_test(series, max_dim=6)
# Returns: {'dimensions': [2..6], 'z_stats': [...], 'p_values': [...]}

# Andrews parameter stability test
stab = parameter_stability_test(series, trim=0.15)
# Returns: {'max_f': ..., 'exp_f': ..., 'ave_f': ...}

# Jarque-Bera normality test
jb = jarque_bera(series)
# Returns: {'statistic': ..., 'p_value': ...}

Table & Export Functions

from qqkrls import (
    qqkrls_coefficient_table,
    krls_summary_table,
    descriptive_statistics,
    export_results_csv,
)

# LaTeX coefficient table with significance stars
latex = qqkrls_coefficient_table(result, digits=3, show_stars=True)

# LaTeX table of KRLS average marginal effects
latex = krls_summary_table(fit, digits=4)

# Descriptive statistics table (Mean, Median, Std, Skew, Kurt, JB)
latex = descriptive_statistics(df, caption="Descriptive Statistics")

# CSV export
export_results_csv(result, "qqkrls_results.csv", digits=4)

Complete Workflow Guide

This section demonstrates the full research workflow from data loading to publication-ready output, mirroring the methodology in Adebayo et al. (2024).

Step 1: Data Preparation & Descriptive Statistics

import numpy as np
import pandas as pd
from qqkrls import descriptive_statistics

# Load your data
df = pd.read_csv("your_data.csv")
y = df["dependent_var"].values
X = df[["x1", "x2", "x3"]].values

# Generate descriptive statistics LaTeX table
latex = descriptive_statistics(df[["dependent_var", "x1", "x2", "x3"]])
print(latex)

Step 2: Pre-Estimation Diagnostics

Before applying KRLS/QQKRLS, demonstrate that the data exhibits nonlinearity:

from qqkrls import linearity_test_battery, print_diagnostics

diag = linearity_test_battery(y, X, col_names=["x1", "x2", "x3"])
print_diagnostics(diag)

# If RESET test rejects linearity → KRLS is justified
# If BDS test rejects i.i.d. → nonparametric approach is justified
# If Breusch-Pagan rejects homoskedasticity → quantile approach is justified

Step 3: KRLS Estimation

from qqkrls import krls, plot_krls_derivatives, plot_krls_fit, plot_krls_panel
from qqkrls import krls_residual_diagnostics, print_krls_diagnostics, krls_summary_table

# Fit KRLS (full nonparametric regression)
fit = krls(X, y, col_names=["x1", "x2", "x3"])

# Post-estimation diagnostics
diag = krls_residual_diagnostics(fit)
print_krls_diagnostics(diag)

# Visualize all marginal effects
plot_krls_panel(fit, save_path="krls_marginal_effects.png")

# Fitted vs actual
plot_krls_fit(fit, save_path="krls_fitted_actual.png")

# LaTeX table
print(krls_summary_table(fit))

Step 4: QQKRLS Estimation

from qqkrls import qqkrls, plot_qqkrls_heatmap, plot_qqkrls_3d
from qqkrls import plot_qqkrls_contour, plot_qqkrls_pvalue

# Define quantile grids
taus = np.arange(0.05, 1.0, 0.05)   # 19 quantiles
thetas = np.arange(0.05, 1.0, 0.05)

# Run QQKRLS for a single X variable
result = qqkrls(y, X[:, 0], y_quantiles=taus, x_quantiles=thetas,
                n_boot=200, verbose=True)

# Summary
result.summary()

# Heatmap (paper-style with significance stars)
plot_qqkrls_heatmap(result, title="QQKRLS: x1 → Y",
                    colorscale="paper", show_stars=True,
                    save_path="qqkrls_heatmap_x1.png")

# 3D surface
plot_qqkrls_3d(result, title="QQKRLS Surface: x1 → Y",
               save_path="qqkrls_3d_x1.png")

# Contour plot
plot_qqkrls_contour(result, save_path="qqkrls_contour_x1.png")

# P-value heatmap
plot_qqkrls_pvalue(result, save_path="qqkrls_pvalue_x1.png")

Step 5: Multi-Variable QQKRLS

from qqkrls import multi_qqkrls, plot_qqkrls_panel

# Run QQKRLS for all independent variables
results_dict = multi_qqkrls(
    y, X,
    col_names=["x1", "x2", "x3"],
    n_boot=200,
    verbose=True,
)

# Panel of heatmaps (one per variable)
plot_qqkrls_panel(results_dict, dep_var="Y",
                  save_path="qqkrls_panel.png")

Step 6: Visualization

# Customize any plot
fig, ax = plot_qqkrls_heatmap(
    result,
    title="Custom Title",
    colorscale="paper",
    show_stars=True,
    show_values=True,    # Show coefficient values in cells
    x_label=r"Quantiles of Energy ($\theta$)",
    y_label=r"Quantiles of CO$_2$ ($\tau$)",
    figsize=(14, 10),
    vmin=-0.5,
    vmax=0.5,
    star_fontsize=7,
    dpi=400,
    save_path="custom_heatmap.png",
)

Step 7: Export & Reporting

# CSV export
result.export_csv("results.csv", digits=4)

# LaTeX table with significance stars
latex = result.export_latex(
    value="coefficient",
    caption="QQKRLS Marginal Effects: Energy → CO₂",
    show_stars=True,
)
print(latex)

# Copy directly to your LaTeX document
with open("table_qqkrls.tex", "w") as f:
    f.write(latex)

Ported From

This library is a faithful Python port of:

Source Version Authors
R CRAN KRLS v1.0-0 Hainmueller & Hazlett
R CRAN QuantileOnQuantile v1.0.3 Roudane
Python wqr v1.0.1 Roudane

References

  1. Hainmueller, J. & Hazlett, C. (2014). Kernel Regularized Least Squares. Political Analysis, 22(2), 143-168. doi:10.1093/pan/mpt024

  2. Sim, N. & Zhou, H. (2015). Oil Prices, US Stock Return, and the Dependence Between Their Quantiles. Journal of Banking & Finance, 55, 1-12. doi:10.1016/j.jbankfin.2015.01.013

  3. Adebayo, T.S., Ozkan, O. & Eweade, B.S. (2024). Do energy efficiency R&D investments and ICT promote environmental sustainability in Sweden? A QQKRLS investigation. Journal of Cleaner Production, 440, 140832. doi:10.1016/j.jclepro.2024.140832

  4. Adebayo, T.S., Meo, M.S., Eweade, B.S. & Ozkan, O. (2024). Analyzing the effects of solar energy innovations, digitalization, and economic globalization on environmental quality in the United States. Clean Technologies and Environmental Policy, 26, 4157-4176. doi:10.1007/s10098-024-02831-0


Author

Dr. Merwan Roudane


License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qqkrls-1.1.1.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qqkrls-1.1.1-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file qqkrls-1.1.1.tar.gz.

File metadata

  • Download URL: qqkrls-1.1.1.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for qqkrls-1.1.1.tar.gz
Algorithm Hash digest
SHA256 55306d6a419ba1275202ec61f385e7d7d7294a9b7ef9d2d5d06571513f1d5415
MD5 832aade9c6e2533bdd0a17ea5d05aa69
BLAKE2b-256 f3b870de63abad87ad67ab3d57e4d6090a16f6de6c25088aade293bbf3cb93a5

See more details on using hashes here.

File details

Details for the file qqkrls-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: qqkrls-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for qqkrls-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d5d695e36a88422ed49ce8f594595386c9957337b67eeea1631a8955fd2c4c2
MD5 32671a98483a21682a0bcd6e025adb7c
BLAKE2b-256 7f1deaa02b7e73daab5a45e8809e25d8fc236c621d12249b95e27c0f3fc9cb3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page