Skip to main content

Python package to easily validate properties of a SageMaker Pipeline.

Project description

Test&Build CodeCov Contributions Welcome Documentation

sagemaker-rightline

This repository contains the source code for sagemaker-rightline, a Python package that eases validation of properties of a SageMaker Pipeline object.

Note that at present this package is in an early stage of development and is not yet ready for production use. We welcome contributions!

README Content

Features

⚙️ Configuration

The Configuration class is responsible for running the Validations against the Pipeline object and returning a Report. The Configuration class is instantiated with a

  • sagemaker.workflow.pipeline.Pipeline object, and
  • a list of Validations.

✔️ Validations

A Validation is a class that inherits from the Validation base class. It is responsible for validating a single property of the Pipeline object. We differentiate between Validations that check the Pipeline object itself (class names beginning with "Pipeline") and Validations that check the Pipeline object's Step objects (class name starting with "Step"). Depending on the specific Validation, a different set of StepTypEnums may be supported.

For example, the StepImagesExist supports Processing and Training steps. It's a validation checks that all ImageURI that Steps of the named types of the Pipeline object reference indeed exist on the target ECR.

The following Validations are currently implemented:

  • PipelineParametersAsExpected
  • StepImagesExist
  • StepKmsKeyIdAsExpected
  • StepNetworkConfigAsExpected
  • StepLambdaFunctionExists
  • StepRoleNameExists
  • StepRoleNameAsExpected
  • StepTagsAsExpected
  • StepInputsAsExpected
  • StepOutputsAsExpected
  • StepOutputsMatchInputsAsExpected
  • StepCallbackSqsQueueExists
  • PipelineProcessingStepsIONamesUnique

In most cases, a Validation subclass requires passing a Rule object to its constructor.

📜 Rules

A Rule is a class that inherits from the Rule base class. It is responsible for defining the rule that a Validation checks for. For example, passing the list of expected KMSKeyIDs and the Rule Equals to StepKmsKeyIdAsExpected will check that all Step objects of the Pipeline object have a KmsKeyId property that matches the passed KMSKeyIDs.

Note that not all Validations require a Rule object, e.g. StepImagesExist.

The following Rules are currently implemented:

  • Equals
  • Contains

All rules support the negative parameter (default: False), which allows for inverting the rule.

📝 Report

A Report is a class whose instance is returned by the Configuration class (optionally a pandas.DataFrame instead). It contains the results of the Validations that were run against the Pipeline object as well as additional information to allow for further analysis.

Usage

from sagemaker.processing import NetworkConfig, ProcessingInput, ProcessingOutput
from sagemaker.workflow.parameters import ParameterString
from sagemaker_rightline.model import Configuration
from sagemaker_rightline.rules import Contains, Equals
from sagemaker_rightline.validations import (
    PipelineParametersAsExpected,
    StepImagesExist,
    StepKmsKeyIdAsExpected,
    StepNetworkConfigAsExpected,
    StepLambdaFunctionExists,
    StepRoleNameExists,
    StepRoleNameAsExpected,
    StepTagsAsExpected,
    StepInputsAsExpected,
    StepOutputsAsExpected,
    StepOutputsMatchInputsAsExpected,
    StepCallbackSqsQueueExists,
)

# Import a dummy pipeline
from tests.fixtures.pipeline import get_sagemaker_pipeline, DUMMY_BUCKET

sm_pipeline = get_sagemaker_pipeline()

# Define Validations
validations = [
    StepImagesExist(),
    PipelineParametersAsExpected(
        parameters_expected=[
            ParameterString(
                name="parameter-1",
                default_value="some-value",
            ),
        ],
        rule=Contains(),
    ),
    StepKmsKeyIdAsExpected(
        kms_key_id_expected="some/kms-key-alias",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepNetworkConfigAsExpected(
        network_config_expected=NetworkConfig(
            enable_network_isolation=False,
            security_group_ids=["sg-1234567890"],
            subnets=["subnet-1234567890"],
        ),
        rule=Equals(negative=True),
    ),
    StepLambdaFunctionExists(),
    StepRoleNameExists(),
    StepRoleNameAsExpected(
        role_name_expected="some-role-name",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepTagsAsExpected(
        tags_expected=[{
            "some-key": "some-value",
        }],
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepInputsAsExpected(
        inputs_expected=[
            ProcessingInput(
                source=f"s3://{DUMMY_BUCKET}/input-1",
                destination="/opt/ml/processing/input",
                input_name="input-2",
            )
        ],
        step_type="Processing",  # either step_type or step_name must be set to filter
        rule=Contains(),
    ),
    StepOutputsAsExpected(
        outputs_expected=[
            ProcessingOutput(
                source="/opt/ml/processing/output",
                destination=f"s3://{DUMMY_BUCKET}/output-1",
                output_name="output-1",
            )
        ],
        step_name="sm_processing_step_spark",  # optional
        rule=Contains(),
    ),
    StepOutputsMatchInputsAsExpected(
        inputs_outputs_expected=[
            {
                "input": {
                    "step_name": "sm_processing_step_sklearn",
                    "input_name": "input-1",
                },
                "output": {
                    "step_name": "sm_processing_step_sklearn",
                    "output_name": "output-1",
                },
            }
        ]
    ),
    StepCallbackSqsQueueExists(),
]

# Add Validations and SageMaker Pipeline to Configuration
cm = Configuration(
    validations=validations,
    sagemaker_pipeline=sm_pipeline,
)

# Run the full Configuration
df = cm.run()

# Show the report
df

img.png

Release

Publishing a new version to PyPI is done via the Release functionality. This will trigger the publish.yml workflow, creating a new release with the version from the tag and publish the package to PyPI.

Contributing

Contributions welcome! We'll add a guide shortly. To get an overview of the structure of the project, have a look at the class diagram, which is auto-generated at build time together with the corresponding PUML file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagemaker-rightline-0.3.7.tar.gz (21.4 kB view hashes)

Uploaded Source

Built Distribution

sagemaker_rightline-0.3.7-py3-none-any.whl (14.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page