Nonparametric distributional regression using LightGBM
Project description
DistributionRegressor
Nonparametric distributional regression using LightGBM. Predicts full probability distributions p(y|x) instead of just point estimates.
Documentation | PyPI | Examples
Overview
DistributionRegressor provides a robust way to predict complete probability distributions over continuous targets. Unlike standard regression that outputs a single value, this package allows you to:
- Predict full probability distributions (arbitrary shapes: multimodal, skewed, etc.)
- Quantify uncertainty with natural confidence intervals
- Obtain point predictions (mean, mode/peak, quantiles)
It uses a CDF-based approach:
- Discretizes the target space into a grid of threshold points.
- Learns the conditional CDF F(τ|x) = P(Y ≤ τ | X = x) using binary targets and logistic loss.
- Enforces monotonicity via LightGBM's monotone constraints on the threshold feature.
- Recovers the PMF by differencing the predicted CDF.
This approach is fast, stable, and requires minimal tuning.
Installation
pip install distribution-regressor
Quick Start
import numpy as np
from distribution_regressor import DistributionRegressor
# 1. Initialize
model = DistributionRegressor(
n_bins=50, # Resolution of the distribution grid
n_estimators=100, # Number of boosting trees
)
# 2. Train
# X: (n_samples, n_features), y: (n_samples,)
model.fit(X_train, y_train)
# 3. Predict Points
y_mean = model.predict(X_test) # Mean (Expected Value)
y_mode = model.predict_mode(X_test) # Mode (Most likely value / Peak)
y_median = model.predict_quantile(X_test, 0.5)
# 4. Predict Intervals & Uncertainty
# 10th and 90th percentiles (80% confidence interval)
lower = model.predict_quantile(X_test, 0.1)
upper = model.predict_quantile(X_test, 0.9)
# 5. Predict Full Distribution
grids, dists, offsets = model.predict_distribution(X_test)
# grids: (n_samples, n_bins) - Per-sample grid points
# dists: (n_samples, n_bins) - Probability mass for each sample
Key Parameters
DistributionRegressor(
n_bins=50, # Number of grid points (higher = more resolution, more RAM)
use_base_model=False, # If True, learns residual CDF around a base LGBM prediction
monte_carlo_training=False, # If True, sample grid points instead of full expansion
mc_samples=5, # MC sample points per observation (when MC enabled)
mc_resample_freq=100, # Resample grid points every N trees (lower = better coverage)
n_estimators=100, # LightGBM trees
learning_rate=0.1, # Learning rate
random_state=42, # Seed
**kwargs # Passed to LGBMRegressor (e.g., max_depth, num_leaves)
)
How It Works
The model learns the conditional CDF using binary classification:
- Grid Creation: A grid of
n_binsthreshold points is created covering the range ofy. - Binary Targets: For each training sample
(x_i, y_i)and thresholdτ_j, the target isz_ij = 1{y_i ≤ τ_j}— simply whethery_ifalls below the threshold. - Single Model: A single LightGBM model is trained with cross-entropy loss on
(x_i, τ_j) → z_ij, with a monotone increasing constraint onτ_jto ensure a valid CDF. - Prediction: At inference, the model predicts F(τ|x) for all grid points, then differences the CDF to recover the probability mass function.
Example Visualization
import matplotlib.pyplot as plt
# Predict distribution for a single sample
grids, dists, offsets = model.predict_distribution(X_test[0:1])
plt.plot(grids[0], dists[0], label='Predicted PMF')
plt.axvline(y_test[0], color='r', linestyle='--', label='True Value')
plt.legend()
plt.show()
Citation
@software{distributionregressor2025,
title={DistributionRegressor: Nonparametric Distributional Regression},
author={Gabor Gulyas},
year={2025},
url={https://github.com/guyko81/DistributionRegressor}
}
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distribution_regressor-2.1.1.tar.gz.
File metadata
- Download URL: distribution_regressor-2.1.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc0e3fe510978dc7930055d45032b61d72fc3cf197d7ff0e3346901b194ba009
|
|
| MD5 |
025fecf46b2d838cab7dd0fece150c69
|
|
| BLAKE2b-256 |
cc38d3d4b7e08cab602ca3f74c05c91f8bea98beb5f24b39b0311e9f1a9c4e73
|
File details
Details for the file distribution_regressor-2.1.1-py3-none-any.whl.
File metadata
- Download URL: distribution_regressor-2.1.1-py3-none-any.whl
- Upload date:
- Size: 60.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97ff8f7c4d4c298a34f65c79c56ae62b205b35c66488ca08a26e052cd4a74c01
|
|
| MD5 |
2cc40dc7c698818fb28142fd1b14feae
|
|
| BLAKE2b-256 |
7e020dc9c63358a9e30a9df5d01c6c03547be520f390c85e93e1203d5d54ba65
|