Skip to main content

ML pipeline framework with data and model drift monitoring.

Project description

DriftPipe

An ML pipeline framework with data and model drift monitoring.

Installation

pip install driftpipe
# or, from a local source checkout:
pip install .

Quick Start

import numpy as np
from driftpipe import (
    IngestStage,
    EvaluateStage,
    PreprocessStage,
    TrainStage,
    BaselineStorage,
    Pipeline,
    Monitor,
    MonitorReport
)


class Ingest(IngestStage):
    def run(self):
        raw_data = np.random.randn(500, 4)
        labels = (raw_data[:, 0] + raw_data[:, 1] > 0).astype(int)
        return {
            "raw_data": raw_data,
            "labels": labels,
            "feature_names": ["f0", "f1", "f2", "f3"],
        }


class Preprocess(PreprocessStage):
    def run(self, raw_data):
        return {"processed_data": raw_data}


class Train(TrainStage):
    def run(self, processed_data, labels):
        threshold = float(np.mean(processed_data[:, 0]))
        return {
            "model": {"threshold": threshold},
            "labels": labels,
        }


class Evaluate(EvaluateStage):
    def run(self, model, raw_data, labels, feature_names, baseline_storage):
        predictions = (raw_data[:, 0] > model["threshold"]).astype(int)
        accuracy = float(np.mean(predictions == labels))
        metrics = {"accuracy": accuracy}

        baseline_storage.compute_and_save_features(raw_data, feature_names)
        baseline_storage.save_metrics(
            baseline_storage.metrics_baseline(metrics, n_samples=len(labels))
        )

        return {"metrics": metrics}


pipeline = Pipeline("weather_demo")

pipeline.baseline_storage = BaselineStorage("weather_demo")

pipeline.add_stage(Ingest)
pipeline.add_stage(Preprocess)
pipeline.add_stage(Train)
pipeline.add_stage(Evaluate)

# Run once to establish the baseline
result = pipeline.run()
assert result.success
pipeline.baseline_metrics = pipeline.baseline_storage.load_metrics()

# Push a new batch through the monitor
monitor = Monitor(pipeline)
new_raw_data = np.random.randn(500, 4) + 0.75
new_labels = (new_raw_data[:, 0] + new_raw_data[:, 1] > 0).astype(int)
monitor_context = {
    "raw_data": new_raw_data,
    "labels": new_labels,
    "feature_names": ["f0", "f1", "f2", "f3"],
}

monitor_result = monitor.run(monitor_context)
assert monitor_result.success

# Generate a drift report from the same monitoring batch
MonitorReport(monitor).generate(
    output_path="weather_drift_report.html",
    monitor_result=monitor_result,
)

# Save pipeline config
pipeline.to_config(
    path="pipeline_demo.json",
    metadata={"dataset": "demo"},
)

Monitoring And Reports

Monitor compares current metrics against pipeline.baseline_metrics and, when raw_data, feature_names, and baseline feature data are available, automatically:

  • stores distributional_data in the pipeline run context
  • stores distributional_metrics in the pipeline run context
  • runs KS and PSI checks for each feature
  • Generates histograms comparing baseline and current distributions for each feature

MonitorReport is the high-level reporting utility. It accepts a Monitor and writes an HTML report.

See examples/ for a full walkthrough.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftpipe-0.1.0.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftpipe-0.1.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file driftpipe-0.1.0.tar.gz.

File metadata

  • Download URL: driftpipe-0.1.0.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for driftpipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1530279faf3d86c8ea1967705b02177741b9bea3adb861188371a29e304c7b09
MD5 f0125b8517ca358584ce501c2dce9713
BLAKE2b-256 e96c065ba6937bc89f12c2f41d7b70f866541becf06acf26af75918d0f6fdf69

See more details on using hashes here.

File details

Details for the file driftpipe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: driftpipe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for driftpipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80d9247ad80f1cc4f59b8d1873be0453f2c3e78ec7f440cdb509c750a1af649b
MD5 72bf529bb393fa5f0178106198990cd1
BLAKE2b-256 088381b269be0acdb671948bae7ca04c6a2f18752e9decd1f7ed046f42381c5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page