Skip to main content

Offline policy evaluation for service-time minimization using DR and Stabilized DR

Project description

skdr-eval

PyPI version Python versions CI Coverage License: MIT

Offline policy evaluation for service-time minimization using Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators with time-aware splits and calibration. Now with pairwise evaluation and autoscaling support.

Features

  • 🎯 Doubly Robust Estimation: Implements both DR and Stabilized DR (SNDR) estimators
  • Time-Aware Evaluation: Uses time-series splits and calibrated propensity scores
  • 🔧 Sklearn Integration: Easy integration with scikit-learn models
  • 📊 Comprehensive Diagnostics: ESS, match rates, propensity score analysis
  • 🚀 Production Ready: Type-hinted, tested, and documented
  • 📈 Bootstrap Confidence Intervals: Moving-block bootstrap for time-series data
  • 🤝 Pairwise Evaluation: Client-operator pairwise evaluation with autoscaling strategies
  • 🎛️ Autoscaling: Direct, stream, and stream_topk strategies with policy induction
  • 🧮 Choice Models: Conditional logit models for propensity estimation

Installation

pip install skdr-eval

Optional Dependencies

For choice models (conditional logit):

pip install skdr-eval[choice]

For speed optimizations (PyArrow, Polars):

pip install skdr-eval[speed]

For development:

git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
pip install -e .[dev]

Quick Start

Standard Evaluation

import skdr_eval
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor

# 1. Generate synthetic service logs
logs, ops_all, true_q = skdr_eval.make_synth_logs(n=5000, n_ops=5, seed=42)

# 2. Define candidate models
models = {
    "RandomForest": RandomForestRegressor(n_estimators=100, random_state=42),
    "HistGradientBoosting": HistGradientBoostingRegressor(random_state=42),
}

# 3. Evaluate models using DR and SNDR
report, detailed_results = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    fit_models=True,
    n_splits=3,
    random_state=42,
)

# 4. View results
print(report[['model', 'estimator', 'V_hat', 'ESS', 'match_rate']])

Pairwise Evaluation

import skdr_eval
from sklearn.ensemble import HistGradientBoostingRegressor, HistGradientBoostingClassifier

# 1. Generate synthetic pairwise data (client-operator pairs)
logs_df, op_daily_df = skdr_eval.make_pairwise_synth(
    n_days=5,
    n_clients_day=1000,
    n_ops=20,
    seed=42
)

# 2. Define models for different tasks
models = {
    "ServiceTime": HistGradientBoostingRegressor(random_state=42),
    "Binary": HistGradientBoostingClassifier(random_state=42),
}

# 3. Run pairwise evaluation with autoscaling
results = skdr_eval.evaluate_pairwise_models(
    logs_df=logs_df,
    op_daily_df=op_daily_df,
    models=models,
    autoscale_strategies=["direct", "stream", "stream_topk"],
    n_splits=3,
    random_state=42
)

# 4. View autoscaling results
for strategy, result in results.items():
    print(f"{strategy}: V_hat = {result['V_hat']:.4f}, ESS = {result['ESS']:.1f}")

API Reference

Core Functions

make_synth_logs(n=5000, n_ops=5, seed=0)

Generate synthetic service logs for evaluation.

Returns:

  • logs: DataFrame with service logs
  • ops_all: Index of all operator names
  • true_q: Ground truth service times

build_design(logs, cli_pref='cli_', st_pref='st_')

Build design matrices from logs.

Returns:

  • Design: Dataclass with feature matrices and metadata

evaluate_sklearn_models(logs, models, **kwargs)

Evaluate sklearn models using DR and SNDR estimators.

Parameters:

  • logs: Service log DataFrame
  • models: Dict of model name to sklearn estimator
  • fit_models: Whether to fit models (default: True)
  • n_splits: Number of time-series splits (default: 3)
  • random_state: Random seed for reproducibility

evaluate_pairwise_models(logs_df, op_daily_df, models, **kwargs)

Evaluate models using pairwise (client-operator) evaluation with autoscaling.

Parameters:

  • logs_df: Pairwise decision log DataFrame
  • op_daily_df: Daily operator availability DataFrame
  • models: Dict of model name to sklearn estimator
  • autoscale_strategies: List of strategies ("direct", "stream", "stream_topk")
  • n_splits: Number of time-series splits (default: 3)
  • random_state: Random seed for reproducibility

Returns:

  • Dict mapping strategy names to evaluation results

make_pairwise_synth(n_days=5, n_clients_day=1000, n_ops=20, **kwargs)

Generate synthetic pairwise (client-operator) data for evaluation.

Parameters:

  • n_days: Number of days to simulate
  • n_clients_day: Number of clients per day
  • n_ops: Number of operators
  • seed: Random seed for reproducibility
  • binary: Whether to generate binary outcomes (default: False)

Returns:

  • logs_df: DataFrame with pairwise decisions
  • op_daily_df: DataFrame with daily operator data

Advanced Functions

fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)

Fit propensity model with time-aware cross-validation and isotonic calibration.

fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)

Fit outcome model with cross-fitting. Supports 'hgb', 'ridge', 'rf', or custom estimators.

dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)

Compute DR and SNDR values with automatic clipping threshold selection.

block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)

Compute confidence intervals using moving-block bootstrap for time-series data.

Why DR and SNDR?

Doubly Robust (DR) estimation provides unbiased policy evaluation when either the propensity model OR the outcome model is correctly specified. The estimator is:

V̂_DR = (1/n) Σ [q̂_π(x_i) + w_i * (y_i - q̂(x_i, a_i))]

Stabilized DR (SNDR) reduces variance by normalizing importance weights:

V̂_SNDR = (1/n) Σ q̂_π(x_i) + [Σ w_i * (y_i - q̂(x_i, a_i))] / [Σ w_i]

Where:

  • q̂_π(x) = expected outcome under evaluation policy π
  • q̂(x,a) = outcome model prediction
  • w_i = π(a_i|x_i) / e(a_i|x_i) = importance weight (clipped)
  • e(a_i|x_i) = propensity score (calibrated)

Key Implementation Details

Autoscaling Strategies

Direct Strategy

Uses the logging policy directly without modification.

Stream Strategy

Induces a policy from sklearn models and applies it to streaming decisions.

Stream TopK Strategy

Similar to stream but restricts choices to top-K operators based on predicted service times.

Time-Series Considerations

  • Uses TimeSeriesSplit for all cross-validation
  • Propensity scores include standardized timestamps
  • Respects temporal ordering in data splits
  • Pairwise evaluation maintains temporal consistency across client-operator pairs

Propensity Score Calibration

  • Per-fold isotonic calibration via CalibratedClassifierCV
  • Fallback to uncalibrated scores if calibration fails
  • Handles class imbalance gracefully

Clipping Threshold Selection

  • DR: Minimize MSE proxy with ESS floor (default: 2% of samples)
  • SNDR: Minimize |SNDR - DR| + MSE proxy
  • Automatic selection from grid: (2, 5, 10, 20, 50, ∞)

Diagnostics and Quality Checks

  • Effective Sample Size (ESS): (Σw)² / Σw²
  • Match Rate: Fraction with positive propensity scores
  • Propensity Quantiles: P01, P05, P10 for positivity assessment
  • Tail Mass: Fraction of samples affected by clipping

Bootstrap Confidence Intervals

For time-series data, use moving-block bootstrap:

# Enable bootstrap CIs
report, _ = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    ci_bootstrap=True,
    alpha=0.05,  # 95% confidence
)

print(report[['model', 'estimator', 'V_hat', 'ci_lower', 'ci_upper']])

Examples

See examples/quickstart.py for a complete example, or run:

python examples/quickstart.py

Development

Setup

git clone https://github.com/dandrsantos/skdr-eval.git
cd skdr-eval
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .[dev]

Testing

pytest -v

Linting and Formatting

ruff check src/ tests/ examples/
ruff format src/ tests/ examples/
mypy src/skdr_eval/

Pre-commit Hooks

pre-commit install
pre-commit run --all-files

Building

python -m build

Publishing to PyPI

This package uses Trusted Publishing (PEP 740) for secure PyPI releases.

Automatic (Recommended)

  1. Create a GitHub release with a version tag (e.g., v0.1.0)
  2. The release.yml workflow will automatically build and publish

Manual Fallback

If Trusted Publishing is not configured:

  1. Set up PyPI API token: https://pypi.org/manage/account/token/
  2. Build the package: python -m build
  3. Upload: twine upload dist/*

Trusted Publishing Setup

  1. Go to https://pypi.org/manage/project/skdr-eval/settings/publishing/
  2. Add GitHub repository as trusted publisher:
    • Repository: dandrsantos/skdr-eval
    • Workflow: release.yml
    • Environment: release

Citation

If you use this software in your research, please cite:

@software{santos2024skdr,
  title = {skdr-eval: Offline Policy Evaluation for Service-Time Minimization},
  author = {Santos, Diogo},
  year = {2024},
  url = {https://github.com/dandrsantos/skdr-eval},
  version = {0.1.0}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgments

Release v0.3.6 - Final fix for pip publishing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skdr_eval-0.4.1.tar.gz (84.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skdr_eval-0.4.1-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file skdr_eval-0.4.1.tar.gz.

File metadata

  • Download URL: skdr_eval-0.4.1.tar.gz
  • Upload date:
  • Size: 84.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skdr_eval-0.4.1.tar.gz
Algorithm Hash digest
SHA256 0134268b59607ccdbf8715f4b28f2c86d687f2bc5b0d5f3923da9764c690580b
MD5 dd3ce7fe69c92837092b04d2af78337a
BLAKE2b-256 3fb532721a55be6533ae62d952b5deb0ecacf55810f4321168cd1b0716deac0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for skdr_eval-0.4.1.tar.gz:

Publisher: release.yml on dgenio/skdr-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skdr_eval-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: skdr_eval-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skdr_eval-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 65df5ddfdf7f5b6d2b8fb2000fe9bffd757b5c6009f02b91277819b47d4a6519
MD5 e04234338a3f4b70dd8be12d774ab2a1
BLAKE2b-256 4cf94d852b2e5628dd4714c16e8f3ddf6161c781a4e563168bf7e23ec08d289e

See more details on using hashes here.

Provenance

The following attestation bundles were made for skdr_eval-0.4.1-py3-none-any.whl:

Publisher: release.yml on dgenio/skdr-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page