Offline policy evaluation for service-time minimization using DR and Stabilized DR

These details have not been verified by PyPI

Project links

Project description

skdr-eval

Offline policy evaluation for service-time minimization using Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators with time-aware splits and calibration.

Features

🎯 Doubly Robust Estimation: Implements both DR and Stabilized DR (SNDR) estimators
⏰ Time-Aware Evaluation: Uses time-series splits and calibrated propensity scores
🔧 Sklearn Integration: Easy integration with scikit-learn models
📊 Comprehensive Diagnostics: ESS, match rates, propensity score analysis
🚀 Production Ready: Type-hinted, tested, and documented
📈 Bootstrap Confidence Intervals: Moving-block bootstrap for time-series data

Installation

pip install skdr-eval

For development:

git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
pip install -e .[dev]

Quick Start

import skdr_eval
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor

# 1. Generate synthetic service logs
logs, ops_all, true_q = skdr_eval.make_synth_logs(n=5000, n_ops=5, seed=42)

# 2. Define candidate models
models = {
    "RandomForest": RandomForestRegressor(n_estimators=100, random_state=42),
    "HistGradientBoosting": HistGradientBoostingRegressor(random_state=42),
}

# 3. Evaluate models using DR and SNDR
report, detailed_results = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    fit_models=True,
    n_splits=3,
    random_state=42,
)

# 4. View results
print(report[['model', 'estimator', 'V_hat', 'ESS', 'match_rate']])

API Reference

Core Functions

`make_synth_logs(n=5000, n_ops=5, seed=0)`

Generate synthetic service logs for evaluation.

Returns:

logs: DataFrame with service logs
ops_all: Index of all operator names
true_q: Ground truth service times

`build_design(logs, cli_pref='cli_', st_pref='st_')`

Build design matrices from logs.

Returns:

Design: Dataclass with feature matrices and metadata

`evaluate_sklearn_models(logs, models, **kwargs)`

Evaluate sklearn models using DR and SNDR estimators.

Parameters:

logs: Service log DataFrame
models: Dict of model name -> model instance
fit_models: Whether to fit models (default: True)
n_splits: Number of CV splits (default: 3)
clip_grid: Clipping thresholds (default: (2,5,10,20,50,∞))
ci_bootstrap: Compute bootstrap CIs (default: False)

Returns:

report: Summary DataFrame with metrics
detailed_results: Detailed results per model

Advanced Functions

`fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)`

Fit propensity model with time-aware cross-validation and isotonic calibration.

`fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)`

Fit outcome model with cross-fitting. Supports 'hgb', 'ridge', 'rf', or custom estimators.

`dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)`

Compute DR and SNDR values with automatic clipping threshold selection.

`block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)`

Compute confidence intervals using moving-block bootstrap for time-series data.

Why DR and SNDR?

Doubly Robust (DR) estimation provides unbiased policy evaluation when either the propensity model OR the outcome model is correctly specified. The estimator is:

V̂_DR = (1/n) Σ [q̂_π(x_i) + w_i * (y_i - q̂(x_i, a_i))]

Stabilized DR (SNDR) reduces variance by normalizing importance weights:

V̂_SNDR = (1/n) Σ q̂_π(x_i) + [Σ w_i * (y_i - q̂(x_i, a_i))] / [Σ w_i]

Where:

q̂_π(x) = expected outcome under evaluation policy π
q̂(x,a) = outcome model prediction
w_i = π(a_i|x_i) / e(a_i|x_i) = importance weight (clipped)
e(a_i|x_i) = propensity score (calibrated)

Key Implementation Details

Time-Aware Evaluation

Uses TimeSeriesSplit for all cross-validation
Propensity scores include standardized timestamps
Respects temporal ordering in data splits

Propensity Score Calibration

Per-fold isotonic calibration via CalibratedClassifierCV
Fallback to uncalibrated scores if calibration fails
Handles class imbalance gracefully

Clipping Threshold Selection

DR: Minimize MSE proxy with ESS floor (default: 2% of samples)
SNDR: Minimize |SNDR - DR| + MSE proxy
Automatic selection from grid: (2, 5, 10, 20, 50, ∞)

Diagnostics and Quality Checks

Effective Sample Size (ESS): (Σw)² / Σw²
Match Rate: Fraction with positive propensity scores
Propensity Quantiles: P01, P05, P10 for positivity assessment
Tail Mass: Fraction of samples affected by clipping

Bootstrap Confidence Intervals

For time-series data, use moving-block bootstrap:

# Enable bootstrap CIs
report, _ = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    ci_bootstrap=True,
    alpha=0.05,  # 95% confidence
)

print(report[['model', 'estimator', 'V_hat', 'ci_lower', 'ci_upper']])

Examples

See examples/quickstart.py for a complete example, or run:

python examples/quickstart.py

Development

Setup

git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .[dev]

Testing

pytest -v

Linting and Formatting

ruff check src/ tests/ examples/
ruff format src/ tests/ examples/
mypy src/skdr_eval/

Pre-commit Hooks

pre-commit install
pre-commit run --all-files

Building

python -m build

Publishing to PyPI

This package uses Trusted Publishing (PEP 740) for secure PyPI releases.

Automatic (Recommended)

Create a GitHub release with a version tag (e.g., v0.1.0)
The release.yml workflow will automatically build and publish

Manual Fallback

If Trusted Publishing is not configured:

Set up PyPI API token: https://pypi.org/manage/account/token/
Build the package: python -m build
Upload: twine upload dist/*

Trusted Publishing Setup

Go to https://pypi.org/manage/project/skdr-eval/settings/publishing/
Add GitHub repository as trusted publisher:
- Repository: dandrsantos/skdr-eval
- Workflow: release.yml
- Environment: release

Citation

If you use this software in your research, please cite:

@software{santos2024skdr,
  title = {skdr-eval: Offline Policy Evaluation for Service-Time Minimization},
  author = {Santos, Diogo},
  year = {2024},
  url = {https://github.com/dandrsantos/skdr-eval},
  version = {0.1.0}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgments

Built with scikit-learn for machine learning
Uses pandas for data manipulation
Follows PEP 621 for project metadata

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

Apr 8, 2026

0.4.2

Sep 14, 2025

0.4.1

Sep 13, 2025

0.3.7

Sep 13, 2025

This version

0.1.1

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skdr_eval-0.1.1.tar.gz (33.7 kB view details)

Uploaded Aug 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skdr_eval-0.1.1-py3-none-any.whl (15.4 kB view details)

Uploaded Aug 12, 2025 Python 3

File details

Details for the file skdr_eval-0.1.1.tar.gz.

File metadata

Download URL: skdr_eval-0.1.1.tar.gz
Upload date: Aug 12, 2025
Size: 33.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for skdr_eval-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d240642d3f853a855b4731302db4823d66b6797cafbf7e7d6a30db45e9bd9592`
MD5	`27d5b4a1694411a5677fd991a5606dc6`
BLAKE2b-256	`b6d34d47e744a4e21c917b7c97da9dced1b6e866093af286ea37344090f8b603`

See more details on using hashes here.

File details

Details for the file skdr_eval-0.1.1-py3-none-any.whl.

File metadata

Download URL: skdr_eval-0.1.1-py3-none-any.whl
Upload date: Aug 12, 2025
Size: 15.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for skdr_eval-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`376343d057a6742415fd4c3caeed89cde827a11e4077b60e34dc2f865ede472b`
MD5	`e62f16efa5c4db3844a8d3a9aa5e9b31`
BLAKE2b-256	`42d9cd47e841f03cf26d41dcd4bcb69d2808cdaa4bff595d737ae6e1da8a021e`

See more details on using hashes here.

skdr-eval 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

skdr-eval

Features

Installation

Quick Start

API Reference

Core Functions

make_synth_logs(n=5000, n_ops=5, seed=0)

build_design(logs, cli_pref='cli_', st_pref='st_')

evaluate_sklearn_models(logs, models, **kwargs)

Advanced Functions

fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)

fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)

dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)

block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)

Why DR and SNDR?

Key Implementation Details

Time-Aware Evaluation

Propensity Score Calibration

Clipping Threshold Selection

Diagnostics and Quality Checks

Bootstrap Confidence Intervals

Examples

Development

Setup

Testing

Linting and Formatting

Pre-commit Hooks

Building

Publishing to PyPI

Automatic (Recommended)

Manual Fallback

Trusted Publishing Setup

Citation

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`make_synth_logs(n=5000, n_ops=5, seed=0)`

`build_design(logs, cli_pref='cli_', st_pref='st_')`

`evaluate_sklearn_models(logs, models, **kwargs)`

`fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)`

`fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)`

`dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)`

`block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)`