Skip to main content

Microsoft Azure Machine Learning Python SDK v2 for collecting model data during operationalization

Project description

Microsoft Azure Machine Learning Data Collection SDK v2 for model monitoring

The azureml-ai-monitoring package provides an SDK to enable Model Data Collector (MDC) for custom logging allows customers to collect data at arbitrary points in their data pre-processing pipeline. Customers can leverage SDK in score.py to log data to desired sink before, during, and after any data transformations.

Quickstart

Start by importing the azureml-ai-monitoring package in score.py

import pandas as pd
import json
from azureml.ai.monitoring import Collector

def init():
  global inputs_collector, outputs_collector

  # instantiate collectors with appropriate names, make sure align with deployment spec
  inputs_collector = Collector(name='model_inputs')                    
  outputs_collector = Collector(name='model_outputs')

def run(data): 
  # json data: { "data" : {  "col1": [1,2,3], "col2": [2,3,4] } }
  pdf_data = preprocess(json.loads(data))
  
  # tabular data: {  "col1": [1,2,3], "col2": [2,3,4] }
  input_df = pd.DataFrame(pdf_data)

  # collect inputs data, store correlation_context
  context = inputs_collector.collect(input_df)

  # perform scoring with pandas Dataframe, return value is also pandas Dataframe
  output_df = predict(input_df) 

  # collect outputs data, pass in correlation_context so inputs and outputs data can be correlated later
  outputs_collector.collect(output_df, context)
  
  return output_df.to_dict()
  
def preprocess(json_data):
  # preprocess the payload to ensure it can be converted to pandas DataFrame
  return json_data["data"]

def predict(input_df):
  # process input and return with outputs
  ...
  
  return output_df

Create environment with base image mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04 and conda dependencies, then build the environment.

channels:
  - conda-forge
dependencies:
  - python=3.8
  - pip=22.3.1
  - pip:
      - azureml-defaults==1.38.0
      - azureml-ai-monitoring
name: model-env

Create deployment with custom logging enabled (model_inputs and model_outputs are enabled) and the environment you just built, please update the yaml according to your scenario.

#source ../configs/model-data-collector/data-storage-basic-OnlineDeployment.YAML
$schema: http://azureml/sdk-2-0/OnlineDeployment.json

endpoint_name: my_endpoint #unchanged
name: blue #unchanged
model: azureml:my-model-m1:1 #azureml:models/<name>:<version> #unchanged
environment: azureml:custom-logging-env@latest #unchanged
data_collector:
  collections:
    model_inputs:
      enabled: 'True'
    model_outputs:
      enabled: 'True'

Configurable error handler

By default, we'll raise the exception when there is unexpected behavior (like custom logging is not enabled, collection is not enabled, not supported data type), if you want a configurable on_error, you can do it with

collector = Collector(name="inputs", on_error=lambda e: logging.info("ex:{}".format(e)))

Change Log

v1.0.0 (2024.4.25)

Announcement

  • Publish official version v1.0.0.

v0.1.0b4 (2023.8.21)

Improvements

  • improve error msg when queue is full.
  • Increase msg queue to handle more requests.

v0.1.0b3 (2023.5.15)

Improvements

  • fix install_requires
  • fix classifiers
  • fix README

v0.1.0b2 (2023.5.9)

New Features

  • Support local capture

v0.1.0b1 (2023.4.25)

New Features

  • Support model data collection for pandas Dataframe.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

azureml_ai_monitoring-1.0.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file azureml_ai_monitoring-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for azureml_ai_monitoring-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d7cbbafec9a4934fa317cd6679b9108986c29d2dc4ccc44b062e319f8c901b1
MD5 60e4a81ea68f5d02241171c48526bff1
BLAKE2b-256 5f7997d545c565a988222e40290af1351268547a46cb9c4b636717e773599f99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page