General-purpose offline policy evaluation for sklearn-compatible models with time-aware Doubly Robust (DR) and Stabilized DR (SNDR) estimators, calibrated propensities, and stakeholder evaluation cards

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dgenio

These details have not been verified by PyPI

Project description

skdr-eval

General-purpose offline policy evaluation (OPE) for sklearn-compatible models, with time-aware Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators, calibrated propensities, PSIS support-health, and a stakeholder evaluation card you can hand to a PM.

Try it in your browser — no install needed:

Notebook	Open in Colab
Quickstart (contextual-bandit OPE)
Pairwise / autoscaling quickstart
E-commerce ranking use case
Ad targeting use case
Healthcare CATE use case

What is this?

skdr-eval is a Python library for offline policy evaluation — estimating how well a candidate decision policy would have performed from logged data alone, without deploying it. It implements Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators on top of scikit-learn-protocol models, with first-class support for time-correlated logs, calibrated propensities, moving-block bootstrap confidence intervals, and a single bundled EvaluationArtifact that exposes per-decision diagnostics, clip-grid sensitivity, PSIS Pareto-k support-health, propensity calibration (ECE / Brier), and a renderable HTML stakeholder card.

It started life as an internal tool for call-routing / service-time minimization (and still ships a pairwise / autoscaling layer for that use case), but the underlying machinery is general-purpose contextual-bandit OPE.

When should I use this?

Reach for skdr-eval when all of the following are true:

You have logged data of the form (context x, action a, reward y) from a policy you no longer want to keep running unchanged.
You want to evaluate a candidate policy (a recommender, a ranker, a clinical decision rule, a routing model, an ad targeter) before A/B testing it, because A/B testing has a real cost (lost revenue, patient risk, SLA violations, operator overtime).
Your candidate policy is, or can be wrapped behind, a scikit-learn-protocol estimator — fit / predict (or predict_proba) is enough.
The logged decisions cover the actions the candidate policy would take with non-trivial probability (i.e., there is reasonable overlap / positivity). skdr-eval will warn you when overlap is thin via PSIS Pareto-k, ESS, and match-rate diagnostics.

Typical use cases:

Recommender / ranking systems — evaluate a new model against logged session data.
Ad targeting — score a candidate bidding policy on Criteo-style counterfactual logs.
Healthcare CATE — compare a treatment-assignment rule to standard-of-care on retrospective records.
Call routing / autoscaling — choose between client-operator assignment policies on historical traffic (the original motivating use case, still first-class via evaluate_pairwise_models).
Any contextual-bandit decision where re-running history would be too expensive or risky to do live.

If you need slate / top-K ranking estimators (Cascade-DR, Reward-Interaction IPS) or MIPS for very large action spaces, those are tracked on the roadmap (#75, #85) but not yet shipped.

Where to start

Just want to see it work? Click any "Open in Colab" badge above.
Have logs already? Skim Quick Start below; the standard / pairwise variants are both two screens long.
Comparing against another OPE library? See docs/methods.md for the positioning vs. Open Bandit Pipeline / SCOPE-RL / banditml.
Looking for end-to-end examples by domain? Browse examples/use_cases/ for runnable scripts (e-commerce ranking, ad targeting, healthcare CATE, call routing).

The skdr-eval CLI (pip install 'skdr-eval[cli]') makes the same evaluators reachable from a terminal — see Command-line interface. Run skdr-eval doctor logs.parquet before evaluation to catch schema and environment problems early.

Features
Installation
Quick Start
- Standard Evaluation
- Pairwise Evaluation
API Reference
Theory
Implementation Details
Bootstrap Confidence Intervals
Examples
Development
Citation

Features

🎯 Doubly Robust Estimation: Implements both DR and Stabilized DR (SNDR) estimators
⏰ Time-Aware Evaluation: Uses time-series splits and calibrated propensity scores
🔧 Sklearn Integration: Easy integration with scikit-learn models
📊 Comprehensive Diagnostics: ESS, match rates, propensity score analysis
🚀 Production Ready: Type-hinted, tested, and documented
📈 Bootstrap Confidence Intervals: Moving-block bootstrap for time-series data
🤝 Pairwise Evaluation: Client-operator pairwise evaluation with autoscaling strategies
🎛️ Autoscaling: Direct, stream, and stream_topk strategies with policy induction
🧮 Choice Models: Conditional logit models for propensity estimation

Installation

pip install skdr-eval

Optional Dependencies

For choice models (conditional logit):

pip install skdr-eval[choice]

For speed optimizations (PyArrow, Polars):

pip install skdr-eval[speed]

For development:

git clone https://github.com/dgenio/skdr-eval.git
cd skdr-eval
pip install -e .[dev]

To run the Colab quickstart notebooks locally:

pip install 'skdr-eval[notebooks]'
jupyter notebook examples/notebooks/

Quick Start

Preflight

Before a real evaluation, confirm your environment + schema in one shot:

import skdr_eval

# Which optional extras are installed?
print(skdr_eval.get_capabilities())
# {'viz': True, 'speed': False, 'missing_extras': ['speed']}

# Validate your logs match the schema evaluate_sklearn_models expects.
skdr_eval.validate_logs(logs, strict=True)

# For the pairwise API:
skdr_eval.validate_pairwise_inputs(
    logs_df, op_daily_df, metric_col="service_time", strict=True,
)

See examples/preflight.py for a runnable script — wire it into CI to catch schema or extras drift before the long-running evaluation kicks off.

Standard Evaluation

import skdr_eval
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor

# 1. Generate synthetic service logs
logs, ops_all, true_q = skdr_eval.make_synth_logs(n=5000, n_ops=5, seed=42)

# 2. Define candidate models
models = {
    "RandomForest": RandomForestRegressor(n_estimators=100, random_state=42),
    "HistGradientBoosting": HistGradientBoostingRegressor(random_state=42),
}

# 3. Evaluate models using DR and SNDR
artifact = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    fit_models=True,
    n_splits=3,
    random_state=42,
)

# 4. View results
print(artifact.report[['model', 'estimator', 'V_hat', 'ESS', 'match_rate']])

# 5. Trust signals (issue #22 / #23)
print(artifact.warnings)        # per-(model, estimator) support_health + codes
print(artifact.sensitivity)     # clip-grid value range and stability flag
print(artifact.diagnostics)     # propensity overlap / calibration / discrimination

# 6. Export (issue #28) and stakeholder card (issue #30)
artifact.export("artifacts/run", formats=["json", "html"])
artifact.save_card("artifacts/run_card.html", "RandomForest")

# 7. Per-decision contributions (issue #92) — opt in with keep_contributions=True
artifact = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    fit_models=True,
    n_splits=3,
    random_state=42,
    keep_contributions=True,  # attach per-decision DR/SNDR pseudo-outcomes
)
contribs = artifact.contributions("RandomForest", estimator="DR", top_k=5)
print(contribs)  # decision_id, q_pi, q_hat, weight, reward, contribution_to_V
#  contribution_to_V.mean() == V_hat by construction (float64 precision)

Breaking change in 0.6.0: evaluate_sklearn_models and evaluate_pairwise_models now return a single EvaluationArtifact instead of the legacy (report, detailed) tuple. Unpack artifact.report / artifact.detailed to migrate.

Pairwise Evaluation

import skdr_eval
from sklearn.ensemble import HistGradientBoostingRegressor

# 1. Generate synthetic pairwise data (client-operator pairs)
logs_df, op_daily_df = skdr_eval.make_pairwise_synth(n_days=3, n_clients_day=500, n_ops=10, seed=42)

# 2. Train model on observed data
feature_cols = [c for c in logs_df.columns if c.startswith(("cli_", "op_"))]
model = HistGradientBoostingRegressor(random_state=42)
model.fit(logs_df[feature_cols].values, logs_df["service_time"].values)

# 3. Run pairwise evaluation
artifact = skdr_eval.evaluate_pairwise_models(
    logs_df=logs_df,
    op_daily_df=op_daily_df,
    models={"HGB": model},
    metric_col="service_time",
    task_type="regression",
    direction="min",
    strategy="auto",
    n_splits=3,
    random_state=42,
)

# 4. View results
print(artifact.report[["model", "estimator", "V_hat", "ESS", "match_rate"]])
print(artifact.warnings)

API Reference

Core Functions

`make_synth_logs(n=5000, n_ops=5, seed=0)`

Generate synthetic service logs for evaluation.

Returns:

logs: DataFrame with service logs
ops_all: Index of all operator names
true_q: Ground truth service times

`build_design(logs, cli_pref='cli_', st_pref='st_')`

Build design matrices from logs.

Returns:

Design: Dataclass with feature matrices and metadata

`evaluate_sklearn_models(logs, models, **kwargs)`

Evaluate sklearn models using DR and SNDR estimators.

Parameters:

logs: Service log DataFrame
models: Dict of model name to sklearn estimator
fit_models: Whether to fit models (default: True)
n_splits: Number of time-series splits (default: 3)
random_state: Random seed for reproducibility

Temporal split controls (keyword-only):

gap: Samples skipped between train and test in each CV fold (default: 1, conservative adjacent-row leakage guard; 0 for sklearn's unbuffered behavior).
test_size: Per-fold test-window size in samples (default: None, defers to sklearn's automatic sizing).
max_train_size: Cap on training-fold size in samples (default: None, expanding window). Set this to switch to a sliding-window CV — useful when early data is no longer representative.

The same trio is accepted by evaluate_pairwise_models, fit_propensity_timecal, fit_outcome_crossfit, and estimate_propensity_pairwise.

`evaluate_pairwise_models(logs_df, op_daily_df, models, metric_col, task_type, direction, **kwargs)`

Evaluate models using pairwise (client-operator) evaluation with autoscaling.

Parameters:

Required:

logs_df: Pairwise decision log DataFrame
op_daily_df: Daily operator availability DataFrame
models: Dict of model name to fitted sklearn estimator
metric_col: Target metric column name
task_type: Type of prediction task ("regression" or "binary")
direction: Whether to minimize or maximize the metric ("min" or "max")

Optional:

n_splits: Number of time-series cross-validation splits (default: 3)
strategy: Policy induction strategy ("auto", "direct", "stream", or "stream_topk"; default: "auto")
propensity: Propensity estimation method ("auto", "condlogit", or "multinomial"; default: "auto"). "auto" lets skdr-eval choose an appropriate method based on the evaluation setup.
topk: Top-K operators for stream_topk strategy (default: 20)
neg_per_pos: Negative samples per positive for conditional logit (default: 5)
chunk_pairs: Chunk size for streaming pair generation (default: 2,000,000)
min_ess_frac: Minimum ESS fraction for clipping threshold selection (default: 0.02)
clip_grid: Tuple of clipping thresholds (default: (2, 5, 10, 20, 50, float("inf")))
ci_bootstrap: Whether to compute bootstrap confidence intervals (default: False)
alpha: Significance level for confidence intervals (default: 0.05)
outcome_estimator: Outcome model (depends on task_type): for "regression": "hgb", "ridge", "rf"; for "binary": "hgb", "logistic"; or a callable (default: "hgb")
day_col: Day column name (default: "arrival_day")
client_id_col: Client ID column name (default: "client_id")
operator_id_col: Operator ID column name (default: "operator_id")
elig_col: Eligibility mask column name (default: "elig_mask")
random_state: Random seed for reproducibility (default: 0)

Returns:

EvaluationArtifact: bundled result. Use .report for the summary DataFrame, .detailed for per-model DRResults, .warnings for support-health warnings, .sensitivity for clip-grid stability, .diagnostics for propensity diagnostics, and .to_json / .to_html / .card / .export for stakeholder artifacts.

`make_pairwise_synth(n_days=14, n_clients_day=2000, n_ops=200, **kwargs)`

Generate synthetic pairwise (client-operator) data for evaluation.

Parameters:

n_days: Number of days to simulate
n_clients_day: Number of clients per day
n_ops: Number of operators
seed: Random seed for reproducibility
binary: Whether to generate binary outcomes (default: False)

Returns:

logs_df: DataFrame with pairwise decisions
op_daily_df: DataFrame with daily operator data

Advanced Functions

`fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)`

Fit propensity model with time-aware cross-validation and isotonic calibration.

`fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)`

Fit outcome model with cross-fitting. Supports 'hgb', 'ridge', 'rf', or custom estimators.

`dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)`

Compute DR and SNDR values with automatic clipping threshold selection.

`block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)`

Compute confidence intervals using moving-block bootstrap for time-series data.

Theory

Why DR and SNDR?

Doubly Robust (DR) estimation provides unbiased policy evaluation when either the propensity model OR the outcome model is correctly specified. The estimator is:

V̂_DR = (1/n) Σ [q̂_π(x_i) + w_i * (y_i - q̂(x_i, a_i))]

Stabilized DR (SNDR) reduces variance by normalizing importance weights:

V̂_SNDR = (1/n) Σ q̂_π(x_i) + [Σ w_i * (y_i - q̂(x_i, a_i))] / [Σ w_i]

Where:

q̂_π(x) = expected outcome under evaluation policy π
q̂(x,a) = outcome model prediction
w_i = π(a_i|x_i) / e(a_i|x_i) = importance weight (clipped)
e(a_i|x_i) = propensity score (calibrated)

Implementation Details

Autoscaling Strategies

Direct: Uses the logging policy directly without modification
Stream: Induces a policy from sklearn models and applies it to streaming decisions
Stream TopK: Similar to stream but restricts choices to top-K operators based on predicted service times

Key Features

Time-Series Aware: Uses TimeSeriesSplit for all cross-validation with temporal ordering
Calibrated Propensities: Per-fold isotonic calibration via CalibratedClassifierCV
Automatic Clipping: Smart threshold selection to minimize variance while maintaining ESS
Comprehensive Diagnostics: ESS, match rates, propensity quantiles, and tail mass analysis

Bootstrap Confidence Intervals

For time-series data, use moving-block bootstrap with proper statistical methodology:

# Enable bootstrap CIs
artifact = skdr_eval.evaluate_sklearn_models(
    logs=logs,
    models=models,
    ci_bootstrap=True,
    alpha=0.05,  # 95% confidence
)

print(artifact.report[['model', 'estimator', 'V_hat', 'ci_lower', 'ci_upper']])

Key Features:

Moving-block bootstrap: Preserves time-series correlation structure
Proper statistical inference: Uses bootstrap distribution of DR contributions
Automatic fallback: Falls back to normal approximation if bootstrap fails
Configurable parameters: Control bootstrap samples, block length, and significance level

Command-line interface

The skdr-eval CLI ships behind the [cli] extra and exposes the same evaluation surface to teams that don't want to write Python.

pip install 'skdr-eval[cli]'

# Quick environment + schema probe before evaluation.
skdr-eval doctor logs.parquet
skdr-eval doctor logs.parquet --json | jq .

# Validate logs against the schema (exit code 1 on failure — useful in CI).
skdr-eval validate-schema logs.parquet --strict
skdr-eval validate-schema pw_logs.parquet --kind pairwise \
    --op-daily pw_op.parquet --metric-col service_time

# Run a full evaluation from disk.
skdr-eval evaluate logs.parquet \
    --model HGB=model.joblib \
    --policy-train pre_split \
    --n-splits 3 \
    --out ./run \
    --tracker-dir ./tracker_runs/2026-05-20

# Re-render a card directly from a saved artifact.json.
skdr-eval card ./run/artifact.json --model HGB --estimator DR \
    --out ./run/card.yaml --format yaml

# Stable exit codes (good for CI gates):
#   0 — success
#   1 — data / schema error
#   2 — environment / import error
#   3 — at least one model row's recommendation verdict is 'do_not_deploy'

Preflight diagnostics: `skdr_eval.doctor`

skdr_eval.doctor(logs, *, kind='standard'|'pairwise', op_daily_df=None, metric_col='service_time', n_splits=3, strict=False) returns a non-raising DoctorReport that surfaces environment + schema + statistical sanity failures with actionable fix hints.

import skdr_eval

logs, _, _ = skdr_eval.make_synth_logs(n=5000, n_ops=3, seed=0)
report = skdr_eval.doctor(logs)
report.print()            # text table with status glyphs
report.to_markdown()      # copy-pasteable Markdown
report.to_dict()          # JSON-serializable
assert report.ok          # True iff no Check has status='fail'

Machine-readable cards: `EvaluationCard`

EvaluationArtifact.card_schema(model_name, estimator='SNDR') builds an EvaluationCard — the typed sibling of the HTML stakeholder card. The card is YAML/JSON round-trippable, exposes a stable json_schema() for downstream tooling, and is ideal for CI gates and Git-pinned snapshots of an evaluation.

artifact = skdr_eval.evaluate_sklearn_models(logs=logs, models=models, fit_models=True)
card = artifact.card_schema("RandomForest", estimator="DR")

card.to_yaml("artifacts/rf.card.yaml")
card.to_json("artifacts/rf.card.json")

# Round-trip
loaded = skdr_eval.EvaluationCard.from_yaml("artifacts/rf.card.yaml")
assert loaded == card

# CI gate
if card.trust.recommendation and card.trust.recommendation["verdict"] == "do_not_deploy":
    raise SystemExit(1)

Experiment tracker

evaluate_sklearn_models and evaluate_pairwise_models both accept a tracker= kwarg. The default NullTracker is a no-op (so the evaluator is unchanged when omitted). FileTracker writes a deterministic run directory to disk; external adapters (MLflowTracker, WandbTracker, AimTracker) ship as stubs behind the [mlflow] / [wandb] / [aim] extras and are filled in under umbrella issue #73.

from skdr_eval import FileTracker

with FileTracker(root="runs/2026-05-20") as tracker:
    artifact = skdr_eval.evaluate_sklearn_models(
        logs=logs, models=models, fit_models=True, tracker=tracker,
    )
# Writes:
#   runs/2026-05-20/metrics.jsonl          (one row per logged metric)
#   runs/2026-05-20/tags.json
#   runs/2026-05-20/artifacts/...
#   runs/2026-05-20/cards/<model>_<estimator>.card.yaml

Examples

examples/ ships three kinds of runnable artifacts — pick the one that matches how you want to consume them:

Path	Format	Use when
`examples/quickstart.py`	`.py`	Headless / CI / no Jupyter installed.
`examples/quickstart_pairwise.py`	`.py`	Same, for the pairwise / autoscaling API.
`examples/preflight.py`	`.py`	One-shot capability + schema check before a long evaluation.
`examples/notebooks/`	`.ipynb` × 5	Colab-runnable; click the badges at the top of this README.
`examples/use_cases/`	`.py` × 4	Self-contained domain walk-throughs (e-commerce ranking, ad targeting, healthcare CATE, call routing).

CI exercises examples/preflight.py, examples/quickstart.py, every notebook in examples/notebooks/, and every script in examples/use_cases/ on every PR — they cannot silently rot.

To run a domain example locally:

python examples/use_cases/01_ecommerce_ranking.py
python examples/use_cases/02_ad_targeting.py
python examples/use_cases/03_healthcare_cate.py
python examples/use_cases/04_call_routing.py

To open the notebooks locally:

pip install 'skdr-eval[notebooks]'
jupyter notebook examples/notebooks/

Development

Setup

git clone https://github.com/dgenio/skdr-eval.git
cd skdr-eval
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .[dev]

Testing

pytest -v

Linting and Formatting

ruff check src/ tests/ examples/
ruff format src/ tests/ examples/
mypy src/skdr_eval/

Pre-commit Hooks

pre-commit install
pre-commit run --all-files

Building

python -m build

Publishing to PyPI

This package uses Trusted Publishing (PEP 740) for secure PyPI releases.

Automatic (Recommended)

Create a GitHub release with a version tag (e.g., v0.1.0)
The release.yml workflow will automatically build and publish

Manual Fallback

If Trusted Publishing is not configured:

Set up PyPI API token: https://pypi.org/manage/account/token/
Build the package: python -m build
Upload: twine upload dist/*

Trusted Publishing Setup

Go to https://pypi.org/manage/project/skdr-eval/settings/publishing/
Add GitHub repository as trusted publisher:
- Repository: dandrsantos/skdr-eval
- Workflow: release.yml
- Environment: release

Citation

If you use this software in your research, please cite:

@software{santos2026skdreval,
  title   = {{skdr-eval}: Offline Policy Evaluation for {sklearn}-Compatible Models with Time-Aware Doubly Robust Estimators},
  author  = {Santos, Diogo},
  year    = {2026},
  url     = {https://github.com/dgenio/skdr-eval},
  version = {0.7.0},
  license = {MIT}
}

A Zenodo concept DOI is being minted on the next tagged release (see docs/zenodo.md); after that, CITATION.cff will carry the canonical DOI and replace the URL field above.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgments

Built with scikit-learn for machine learning
Uses pandas for data manipulation
Follows PEP 621 for project metadata

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dgenio

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.10.0

May 29, 2026

This version

0.9.0

May 22, 2026

0.8.0

May 20, 2026

0.7.0

May 17, 2026

0.5.0

Apr 8, 2026

0.4.2

Sep 14, 2025

0.4.1

Sep 13, 2025

0.3.7

Sep 13, 2025

0.1.1

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skdr_eval-0.9.0.tar.gz (288.8 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skdr_eval-0.9.0-py3-none-any.whl (131.8 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file skdr_eval-0.9.0.tar.gz.

File metadata

Download URL: skdr_eval-0.9.0.tar.gz
Upload date: May 22, 2026
Size: 288.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skdr_eval-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`e3d8457611fa398d416827da78c230eec6d2a78ea5e3d285997fb3b0e6cc1789`
MD5	`056cb1362de88c16b734b0448b53ba11`
BLAKE2b-256	`2e02ff9b1f0c2d960c70dbe9c4a2c08c9caa1bd15740e80191ab1f4aaa884e98`

See more details on using hashes here.

Provenance

The following attestation bundles were made for skdr_eval-0.9.0.tar.gz:

Publisher: release.yml on dgenio/skdr-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: skdr_eval-0.9.0.tar.gz
- Subject digest: e3d8457611fa398d416827da78c230eec6d2a78ea5e3d285997fb3b0e6cc1789
- Sigstore transparency entry: 1601840882
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: dgenio/skdr-eval@fcb34fbebf30bd90c7134eef04e4e30913714400
- Branch / Tag: refs/tags/v0.9.0
- Owner: https://github.com/dgenio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fcb34fbebf30bd90c7134eef04e4e30913714400
- Trigger Event: push

File details

Details for the file skdr_eval-0.9.0-py3-none-any.whl.

File metadata

Download URL: skdr_eval-0.9.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 131.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skdr_eval-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`142815b151f49fecfcc38de176eaf347165a8a21476051f936509c0f68294300`
MD5	`96d7bed03ed6b271e12e0b1f0d21d1b6`
BLAKE2b-256	`e1e54ff2ab01717f4d9bd5918abc92c369c16daa0d879daf98db25e38ebdfdfc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for skdr_eval-0.9.0-py3-none-any.whl:

Publisher: release.yml on dgenio/skdr-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: skdr_eval-0.9.0-py3-none-any.whl
- Subject digest: 142815b151f49fecfcc38de176eaf347165a8a21476051f936509c0f68294300
- Sigstore transparency entry: 1601840915
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: dgenio/skdr-eval@fcb34fbebf30bd90c7134eef04e4e30913714400
- Branch / Tag: refs/tags/v0.9.0
- Owner: https://github.com/dgenio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fcb34fbebf30bd90c7134eef04e4e30913714400
- Trigger Event: push

skdr-eval 0.9.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

skdr-eval

What is this?

When should I use this?

Where to start

Table of Contents

Features

Installation

Optional Dependencies

Quick Start

Preflight

Standard Evaluation

Pairwise Evaluation

API Reference

Core Functions

make_synth_logs(n=5000, n_ops=5, seed=0)

build_design(logs, cli_pref='cli_', st_pref='st_')

evaluate_sklearn_models(logs, models, **kwargs)

evaluate_pairwise_models(logs_df, op_daily_df, models, metric_col, task_type, direction, **kwargs)

make_pairwise_synth(n_days=14, n_clients_day=2000, n_ops=200, **kwargs)

Advanced Functions

fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)

fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)

dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)

block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)

Theory

Why DR and SNDR?

Implementation Details

Autoscaling Strategies

Key Features

Bootstrap Confidence Intervals

Command-line interface

Preflight diagnostics: skdr_eval.doctor

Machine-readable cards: EvaluationCard

Experiment tracker

Examples

Development

Setup

Testing

Linting and Formatting

Pre-commit Hooks

Building

Publishing to PyPI

Automatic (Recommended)

Manual Fallback

Trusted Publishing Setup

Citation

License

Contributing

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`make_synth_logs(n=5000, n_ops=5, seed=0)`

`build_design(logs, cli_pref='cli_', st_pref='st_')`

`evaluate_sklearn_models(logs, models, **kwargs)`

`evaluate_pairwise_models(logs_df, op_daily_df, models, metric_col, task_type, direction, **kwargs)`

`make_pairwise_synth(n_days=14, n_clients_day=2000, n_ops=200, **kwargs)`

`fit_propensity_timecal(X_phi, A, n_splits=3, random_state=0)`

`fit_outcome_crossfit(X_obs, Y, n_splits=3, estimator='hgb', random_state=0)`

`dr_value_with_clip(propensities, policy_probs, Y, q_hat, A, elig, clip_grid=...)`

`block_bootstrap_ci(values_num, values_den, base_mean, n_boot=400, **kwargs)`

Preflight diagnostics: `skdr_eval.doctor`

Machine-readable cards: `EvaluationCard`