Offline policy evaluation for service-time minimization using DR and Stabilized DR
Project description
skdr-eval
Offline policy evaluation for service-time minimization using Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators with time-aware splits and calibration. Now with pairwise evaluation and autoscaling support.
Features
- 🎯 Doubly Robust Estimation: Implements both DR and Stabilized DR (SNDR) estimators
- ⏰ Time-Aware Evaluation: Uses time-series splits and calibrated propensity scores
- 🔧 Sklearn Integration: Easy integration with scikit-learn models
- 📊 Comprehensive Diagnostics: ESS, match rates, propensity score analysis
- 🚀 Production Ready: Type-hinted, tested, and documented
- 📈 Bootstrap Confidence Intervals: Moving-block bootstrap for time-series data
- 🤝 Pairwise Evaluation: Client-operator pairwise evaluation with autoscaling strategies
- 🎛️ Autoscaling: Direct, stream, and stream_topk strategies with policy induction
- 🧮 Choice Models: Conditional logit models for propensity estimation
Installation
pip install skdr-eval
Optional Dependencies
For choice models (conditional logit):
pip install skdr-eval[choice]
For speed optimizations (PyArrow, Polars):
pip install skdr-eval[speed]
For development:
git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
pip install -e .[dev]
Quick Start
Standard Evaluation
import skdr_eval
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor
# 1. Generate synthetic service logs
logs, ops_all, true_q = skdr_eval.make_synth_logs(n=5000, n_ops=5, seed=42)
# 2. Define candidate models
models = {
"RandomForest": RandomForestRegressor(n_estimators=100, random_state=42),
"HistGradientBoosting": HistGradientBoostingRegressor(random_state=42),
}
# 3. Evaluate models using DR and SNDR
report, detailed_results = skdr_eval.evaluate_sklearn_models(
logs=logs,
models=models,
fit_models=True,
n_splits=3,
random_state=42,
)
# 4. View results
print(report[['model', 'estimator', 'V_hat', 'ESS', 'match_rate']])
Pairwise Evaluation
import skdr_eval
from sklearn.ensemble import HistGradientBoostingRegressor, HistGradientBoostingClassifier
# 1. Generate synthetic pairwise data (client-operator pairs)
logs_df, op_daily_df = skdr_eval.make_pairwise_synth(
n_days=5,
n_clients_day=1000,
n_ops=20,
seed=42
)
# 2. Define models for different tasks
models = {
"ServiceTime": HistGradientBoostingRegressor(random_state=42),
"Binary": HistGradientBoostingClassifier(random_state=42),
}
# 3. Run pairwise evaluation with autoscaling
results = skdr_eval.evaluate_pairwise_models(
logs_df=logs_df,
op_daily_df=op_daily_df,
models=models,
autoscale_strategies=["direct", "stream", "stream_topk"],
n_splits=3,
random_state=42
)
# 4. View autoscaling results
for strategy, result in results.items():
print(f"{strategy}: V_hat = {result['V_hat']:.4f}, ESS = {result['ESS']:.1f}")
API Reference
Core Functions
make_synth_logs(n=5000, n_ops=5, seed=0)
Generate synthetic service logs for evaluation.
Returns:
logs: DataFrame with service logsops_all: Index of all operator namestrue_q: Ground truth service times
build_design(logs, cli_pref='cli_', st_pref='st_')
Build design matrices from logs.
Returns:
Design: Dataclass with feature matrices and metadata
evaluate_sklearn_models(logs, models, **kwargs)
Evaluate sklearn models using DR and SNDR estimators.
Parameters:
logs: Service log DataFramemodels: Dict of model name to sklearn estimatorfit_models: Whether to fit models (default: True)n_splits: Number of time-series splits (default: 3)random_state: Random seed for reproducibility
evaluate_pairwise_models(logs_df, op_daily_df, models, **kwargs)
Evaluate models using pairwise (client-operator) evaluation with autoscaling.
Parameters:
logs_df: Pairwise decision log DataFrameop_daily_df: Daily operator availability DataFramemodels: Dict of model name to sklearn estimatorautoscale_strategies: List of strategies ("direct", "stream", "stream_topk")n_splits: Number of time-series splits (default: 3)random_state: Random seed for reproducibility
Returns:
- Dict mapping strategy names to evaluation results
make_pairwise_synth(n_days=5, n_clients_day=1000, n_ops=20, **kwargs)
Generate synthetic pairwise (client-operator) data for evaluation.
Parameters:
n_days: Number of days to simulaten_clients_day: Number of clients per dayn_ops: Number of operatorsseed: Random seed for reproducibilitybinary: Whether to generate binary outcomes (default: False)
Returns:
logs_df: DataFrame with pairwise decisionsop_daily_df: DataFrame with daily operator data
Advanced Functions
fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)
Fit propensity model with time-aware cross-validation and isotonic calibration.
fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)
Fit outcome model with cross-fitting. Supports 'hgb', 'ridge', 'rf', or custom estimators.
dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)
Compute DR and SNDR values with automatic clipping threshold selection.
block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)
Compute confidence intervals using moving-block bootstrap for time-series data.
Parameters:
values_num: Numerator values for bootstrapvalues_den: Denominator values for ratio estimation (optional)base_mean: Base mean for centeringn_boot: Number of bootstrap samples (default: 400)block_len: Block length for time-series correlation (default: sqrt(n))alpha: Significance level (default: 0.05)random_state: Random seed for reproducibility
Returns:
ci_lower: Lower confidence boundci_upper: Upper confidence bound
Why DR and SNDR?
Doubly Robust (DR) estimation provides unbiased policy evaluation when either the propensity model OR the outcome model is correctly specified. The estimator is:
V̂_DR = (1/n) Σ [q̂_π(x_i) + w_i * (y_i - q̂(x_i, a_i))]
Stabilized DR (SNDR) reduces variance by normalizing importance weights:
V̂_SNDR = (1/n) Σ q̂_π(x_i) + [Σ w_i * (y_i - q̂(x_i, a_i))] / [Σ w_i]
Where:
q̂_π(x)= expected outcome under evaluation policy πq̂(x,a)= outcome model predictionw_i = π(a_i|x_i) / e(a_i|x_i)= importance weight (clipped)e(a_i|x_i)= propensity score (calibrated)
Key Implementation Details
Autoscaling Strategies
Direct Strategy
Uses the logging policy directly without modification.
Stream Strategy
Induces a policy from sklearn models and applies it to streaming decisions.
Stream TopK Strategy
Similar to stream but restricts choices to top-K operators based on predicted service times.
Time-Series Considerations
- Uses
TimeSeriesSplitfor all cross-validation - Propensity scores include standardized timestamps
- Respects temporal ordering in data splits
- Pairwise evaluation maintains temporal consistency across client-operator pairs
Propensity Score Calibration
- Per-fold isotonic calibration via
CalibratedClassifierCV - Fallback to uncalibrated scores if calibration fails
- Handles class imbalance gracefully
Clipping Threshold Selection
- DR: Minimize MSE proxy with ESS floor (default: 2% of samples)
- SNDR: Minimize |SNDR - DR| + MSE proxy
- Automatic selection from grid:
(2, 5, 10, 20, 50, ∞)
Diagnostics and Quality Checks
- Effective Sample Size (ESS):
(Σw)² / Σw² - Match Rate: Fraction with positive propensity scores
- Propensity Quantiles: P01, P05, P10 for positivity assessment
- Tail Mass: Fraction of samples affected by clipping
Bootstrap Confidence Intervals
For time-series data, use moving-block bootstrap with proper statistical methodology:
# Enable bootstrap CIs
report, _ = skdr_eval.evaluate_sklearn_models(
logs=logs,
models=models,
ci_bootstrap=True,
alpha=0.05, # 95% confidence
)
print(report[['model', 'estimator', 'V_hat', 'ci_lower', 'ci_upper']])
Key Features:
- Moving-block bootstrap: Preserves time-series correlation structure
- Proper statistical inference: Uses bootstrap distribution of DR contributions
- Automatic fallback: Falls back to normal approximation if bootstrap fails
- Configurable parameters: Control bootstrap samples, block length, and significance level
Examples
See examples/quickstart.py for a complete example, or run:
python examples/quickstart.py
Development
Setup
git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .[dev]
Testing
pytest -v
Linting and Formatting
ruff check src/ tests/ examples/
ruff format src/ tests/ examples/
mypy src/skdr_eval/
Pre-commit Hooks
pre-commit install
pre-commit run --all-files
Building
python -m build
Publishing to PyPI
This package uses Trusted Publishing (PEP 740) for secure PyPI releases.
Automatic (Recommended)
- Create a GitHub release with a version tag (e.g.,
v0.1.0) - The
release.ymlworkflow will automatically build and publish
Manual Fallback
If Trusted Publishing is not configured:
- Set up PyPI API token: https://pypi.org/manage/account/token/
- Build the package:
python -m build - Upload:
twine upload dist/*
Trusted Publishing Setup
- Go to https://pypi.org/manage/project/skdr-eval/settings/publishing/
- Add GitHub repository as trusted publisher:
- Repository:
dandrsantos/skdr-eval - Workflow:
release.yml - Environment:
release
- Repository:
Citation
If you use this software in your research, please cite:
@software{santos2024skdr,
title = {skdr-eval: Offline Policy Evaluation for Service-Time Minimization},
author = {Santos, Diogo},
year = {2024},
url = {https://github.com/dandrsantos/skdr-eval},
version = {0.1.0}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Acknowledgments
- Built with scikit-learn for machine learning
- Uses pandas for data manipulation
- Follows PEP 621 for project metadata
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skdr_eval-0.4.2.tar.gz.
File metadata
- Download URL: skdr_eval-0.4.2.tar.gz
- Upload date:
- Size: 90.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02e39e47dbf0f67d00b4ab276a7d49d2653cc2e2ef0c1399be60d7ddc4ed7155
|
|
| MD5 |
0869fb2397dc282b86df625145897516
|
|
| BLAKE2b-256 |
d6e3f72bb54b1ded2beef228f395136306cb1f72a6041bc642f4bdb3321d1d2a
|
Provenance
The following attestation bundles were made for skdr_eval-0.4.2.tar.gz:
Publisher:
release.yml on dgenio/skdr-eval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skdr_eval-0.4.2.tar.gz -
Subject digest:
02e39e47dbf0f67d00b4ab276a7d49d2653cc2e2ef0c1399be60d7ddc4ed7155 - Sigstore transparency entry: 513818835
- Sigstore integration time:
-
Permalink:
dgenio/skdr-eval@8ad380b7085dcd583b709e8f5b509b58a199137d -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/dgenio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8ad380b7085dcd583b709e8f5b509b58a199137d -
Trigger Event:
push
-
Statement type:
File details
Details for the file skdr_eval-0.4.2-py3-none-any.whl.
File metadata
- Download URL: skdr_eval-0.4.2-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a0504ccdf7f86e10441b9662fb3181ae46b048c38a3df847c8eed6356fc93e8
|
|
| MD5 |
cbbaa190baad728bd9d2c577e6f1d67f
|
|
| BLAKE2b-256 |
46eb67b20030564e22330bc50a57ecbfb59b3abd0cbb76754e612d9839ed24e4
|
Provenance
The following attestation bundles were made for skdr_eval-0.4.2-py3-none-any.whl:
Publisher:
release.yml on dgenio/skdr-eval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skdr_eval-0.4.2-py3-none-any.whl -
Subject digest:
8a0504ccdf7f86e10441b9662fb3181ae46b048c38a3df847c8eed6356fc93e8 - Sigstore transparency entry: 513818863
- Sigstore integration time:
-
Permalink:
dgenio/skdr-eval@8ad380b7085dcd583b709e8f5b509b58a199137d -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/dgenio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8ad380b7085dcd583b709e8f5b509b58a199137d -
Trigger Event:
push
-
Statement type: