Skip to main content

Python package to easily validate properties of a SageMaker Pipeline.

Project description

Test&Build CodeCov Contributions Welcome Documentation

sagemaker-rightline

This repository contains the source code for sagemaker-rightline, a Python package that eases validation of properties of a SageMaker Pipeline object.

Note that at present this package is in an early stage of development and is not yet ready for production use. We welcome contributions!

README Content

Features

⚙️ Configuration

The Configuration class is responsible for running the Validations against the Pipeline object and returning a Report. The Configuration class is instantiated with a

  • sagemaker.workflow.pipeline.Pipeline object, and
  • a list of Validations.

✔️ Validations

A Validation is a class that inherits from the Validation base class. It is responsible for validating a single property of the Pipeline object. We differentiate between Validations that check the Pipeline object itself (class names beginning with "Pipeline") and Validations that check the Pipeline object's Step objects (class name starting with "Step"). Depending on the specific Validation, a different set of StepTypEnums may be supported.

For example, the StepImagesExist supports Processing and Training steps. It's a validation checks that all ImageURI that Steps of the named types of the Pipeline object reference indeed exist on the target ECR.

The following Validations are currently implemented:

  • PipelineParametersAsExpected
  • StepImagesExist
  • StepKmsKeyIdAsExpected
  • StepNetworkConfigAsExpected
  • StepLambdaFunctionExists
  • StepRoleNameExists
  • StepRoleNameAsExpected
  • StepTagsAsExpected
  • StepInputsAsExpected
  • StepOutputsAsExpected
  • StepOutputsMatchInputsAsExpected
  • StepCallbackSqsQueueExists
  • PipelineProcessingStepsIONamesUnique

In most cases, a Validation subclass requires passing a Rule object to its constructor.

📜 Rules

A Rule is a class that inherits from the Rule base class. It is responsible for defining the rule that a Validation checks for. For example, passing the list of expected KMSKeyIDs and the Rule Equals to StepKmsKeyIdAsExpected will check that all Step objects of the Pipeline object have a KmsKeyId property that matches the passed KMSKeyIDs.

Note that not all Validations require a Rule object, e.g. StepImagesExist.

The following Rules are currently implemented:

  • Equals
  • Contains

All rules support the negative parameter (default: False), which allows for inverting the rule.

📝 Report

A Report is a class whose instance is returned by the Configuration class (optionally a pandas.DataFrame instead). It contains the results of the Validations that were run against the Pipeline object as well as additional information to allow for further analysis.

Usage

from sagemaker.processing import NetworkConfig, ProcessingInput, ProcessingOutput
from sagemaker.workflow.parameters import ParameterString
from sagemaker_rightline.model import Configuration
from sagemaker_rightline.rules import Contains, Equals
from sagemaker_rightline.validations import (
    PipelineParametersAsExpected,
    StepImagesExist,
    StepKmsKeyIdAsExpected,
    StepNetworkConfigAsExpected,
    StepLambdaFunctionExists,
    StepRoleNameExists,
    StepRoleNameAsExpected,
    StepTagsAsExpected,
    StepInputsAsExpected,
    StepOutputsAsExpected,
    StepOutputsMatchInputsAsExpected,
    StepCallbackSqsQueueExists,
)

# Import a dummy pipeline
from tests.fixtures.pipeline import get_sagemaker_pipeline, DUMMY_BUCKET

sm_pipeline = get_sagemaker_pipeline()

# Define Validations
validations = [
    StepImagesExist(),
    PipelineParametersAsExpected(
        parameters_expected=[
            ParameterString(
                name="parameter-1",
                default_value="some-value",
            ),
        ],
        rule=Contains(),
    ),
    StepKmsKeyIdAsExpected(
        kms_key_id_expected="some/kms-key-alias",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepNetworkConfigAsExpected(
        network_config_expected=NetworkConfig(
            enable_network_isolation=False,
            security_group_ids=["sg-1234567890"],
            subnets=["subnet-1234567890"],
        ),
        rule=Equals(negative=True),
    ),
    StepLambdaFunctionExists(),
    StepRoleNameExists(),
    StepRoleNameAsExpected(
        role_name_expected="some-role-name",
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepTagsAsExpected(
        tags_expected=[{
            "some-key": "some-value",
        }],
        step_name="sm_training_step_sklearn",  # optional: if not set, will check all steps
        rule=Equals(),
    ),
    StepInputsAsExpected(
        inputs_expected=[
            ProcessingInput(
                source=f"s3://{DUMMY_BUCKET}/input-1",
                destination="/opt/ml/processing/input",
                input_name="input-2",
            )
        ],
        step_type="Processing",  # either step_type or step_name must be set to filter
        rule=Contains(),
    ),
    StepOutputsAsExpected(
        outputs_expected=[
            ProcessingOutput(
                source="/opt/ml/processing/output",
                destination=f"s3://{DUMMY_BUCKET}/output-1",
                output_name="output-1",
            )
        ],
        step_name="sm_processing_step_spark",  # optional
        rule=Contains(),
    ),
    StepOutputsMatchInputsAsExpected(
        inputs_outputs_expected=[
            {
                "input": {
                    "step_name": "sm_processing_step_sklearn",
                    "input_name": "input-1",
                },
                "output": {
                    "step_name": "sm_processing_step_sklearn",
                    "output_name": "output-1",
                },
            }
        ]
    ),
    StepCallbackSqsQueueExists(),
]

# Add Validations and SageMaker Pipeline to Configuration
cm = Configuration(
    validations=validations,
    sagemaker_pipeline=sm_pipeline,
)

# Run the full Configuration
df = cm.run()

# Show the report
df

img.png

Release

Publishing a new version to PyPI is done via the Release functionality. This will trigger the publish.yml workflow, creating a new release with the version from the tag and publish the package to PyPI.

Contributing

Contributions welcome! We'll add a guide shortly. To get an overview of the structure of the project, have a look at the class diagram, which is auto-generated at build time together with the corresponding PUML file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagemaker-rightline-0.3.7.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagemaker_rightline-0.3.7-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file sagemaker-rightline-0.3.7.tar.gz.

File metadata

  • Download URL: sagemaker-rightline-0.3.7.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for sagemaker-rightline-0.3.7.tar.gz
Algorithm Hash digest
SHA256 b1475b77e5bdf0713aa5af25373f706390136da5c0169bc088c352cf06f1c1af
MD5 23e0837d7d4ef06d839bfced5887e7f5
BLAKE2b-256 be816d5e8f712916fc0e47a1f56d620c7cc1343f76797a93f651c736edad7808

See more details on using hashes here.

File details

Details for the file sagemaker_rightline-0.3.7-py3-none-any.whl.

File metadata

File hashes

Hashes for sagemaker_rightline-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2c0dce2f4bcf7dcb9646abc5bd4f8bf7f8ee03fa1bea2483ff62b8411b5eb2ed
MD5 6ff614aef11d801e7a7ac9cbdd42776b
BLAKE2b-256 d67099167223becc8cbf46700c5f97fa20b794cacf377d51ce429e260e78cbe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page