A Python wrapper library for SageMaker SDK v3 with configuration-driven defaults
Project description
[!WARNING] This mlp_sdk_v3 example demonstrates how to develop an ML Platform SDK wrapper library, providing a way to simplify infrastructure configuration management and standardize ML workflows across teams. It is intended as a reference guide for customers to help them create their own customized SDK wrappers. Note: This library is provided for illustrative purposes only and should not be used directly in production environments.
mlp_sdk_v3
A Python wrapper library for SageMaker SDK v3 with configuration-driven defaults.
Overview
The mlp_sdk_v3 simplifies SageMaker operations by providing a session-based interface with configuration-driven defaults. Built on top of the SageMaker Python SDK v3, it abstracts infrastructure complexity while maintaining full compatibility with the underlying SDK.
Key Features
- Configuration-driven defaults: Define AWS resources (VPCs, security groups, S3 buckets) in YAML configuration files
- Simple session interface: Single entry point for all SageMaker operations
- Runtime parameter override: Override any default configuration at runtime
- Full SageMaker SDK compatibility: Access underlying SageMaker SDK objects for advanced use cases
- Comprehensive error handling: Clear error messages with actionable guidance
- Encryption support: AES-256-GCM encryption for sensitive configuration values
- Audit trail: Track all operations for debugging and compliance
Installation
pip install mlp_sdk_v3
Quick Start
Generate Configuration
First, generate your configuration file:
# Interactive mode (recommended)
python examples/generate_admin_config.py --interactive
# Or use defaults
python examples/generate_admin_config.py --output /home/sagemaker-user/.config/admin-config.yaml
See examples/QUICKSTART.md for a complete quick start guide.
Basic Usage
from mlp_sdk_v3 import MLP_Session
# Initialize session with default configuration
session = MLP_Session()
# Create a feature group
feature_group = session.create_feature_group(
feature_group_name="customer-features",
record_identifier_name="customer_id",
event_time_feature_name="event_time",
feature_definitions=[
{"FeatureName": "customer_id", "FeatureType": "String"},
{"FeatureName": "age", "FeatureType": "Integral"},
{"FeatureName": "income", "FeatureType": "Fractional"},
{"FeatureName": "event_time", "FeatureType": "String"}
]
)
# Run a processing job
processor = session.run_processing_job(
job_name="data-preprocessing",
processing_script="preprocess.py",
inputs=[{"source": "s3://my-bucket/raw-data/", "destination": "/opt/ml/processing/input"}],
outputs=[{"source": "/opt/ml/processing/output", "destination": "s3://my-bucket/processed-data/"}]
)
# Run a training job
trainer = session.run_training_job(
job_name="model-training",
training_image="763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-cpu-py310",
source_code_dir="training-scripts",
entry_script="train.py",
inputs={"train": "s3://my-bucket/processed-data/"}
)
# Create a pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep
pipeline = session.create_pipeline(
pipeline_name="ml-workflow",
steps=[
ProcessingStep(name="preprocess", processor=processor),
TrainingStep(name="train", estimator=trainer)
]
)
Configuration
Configuration File Location
By default, mlp_sdk_v3 loads configuration from:
/home/sagemaker-user/.config/admin-config.yaml
You can specify a custom configuration path:
session = MLP_Session(config_path="/path/to/custom-config.yaml")
Configuration File Format
Create a YAML configuration file with the following structure:
defaults:
# S3 Configuration
s3:
default_bucket: "my-sagemaker-bucket"
input_prefix: "input/"
output_prefix: "output/"
model_prefix: "models/"
# Networking Configuration
networking:
vpc_id: "vpc-12345678"
security_group_ids: ["sg-12345678"]
subnets: ["subnet-12345678", "subnet-87654321"]
# Compute Configuration
compute:
processing_instance_type: "ml.m5.large"
training_instance_type: "ml.m5.xlarge"
processing_instance_count: 1
training_instance_count: 1
# Feature Store Configuration
feature_store:
offline_store_s3_uri: "s3://my-sagemaker-bucket/feature-store/"
enable_online_store: false
# IAM Configuration
iam:
execution_role: "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# KMS Configuration (optional)
kms:
key_id: "arn:aws:kms:REGION:ACCOUNT-ID:key/KEY-ID"
Configuration Precedence
Configuration values are applied in the following order (later values override earlier ones):
- SageMaker SDK defaults - Built-in defaults from the SageMaker SDK
- YAML configuration - Values from your configuration file
- Runtime parameters - Values passed directly to method calls
Example:
# This will use the training_instance_type from config (ml.m5.xlarge)
trainer = session.run_training_job(job_name="my-job", ...)
# This will override the config and use ml.p3.2xlarge
trainer = session.run_training_job(
job_name="my-job",
instance_type="ml.p3.2xlarge", # Runtime override
...
)
Encryption Setup
mlp_sdk_v3 supports AES-256-GCM encryption for sensitive configuration values.
Generating an Encryption Key
from mlp_sdk_v3.config import ConfigurationManager
# Generate a new encryption key
key = ConfigurationManager.generate_key()
print(f"Encryption key: {key}")
# Save this key securely!
Loading Encryption Keys
From Environment Variable
import os
from mlp_sdk_v3.config import ConfigurationManager
# Set environment variable
os.environ['MLP_SDK_ENCRYPTION_KEY'] = 'your-base64-encoded-key'
# Load key from environment
key = ConfigurationManager.load_key_from_env()
session = MLP_Session(config_path="encrypted-config.yaml")
From File
from mlp_sdk_v3.config import ConfigurationManager
# Load key from file
key = ConfigurationManager.load_key_from_file("/path/to/keyfile")
config_manager = ConfigurationManager(
config_path="encrypted-config.yaml",
encryption_key=key
)
From AWS KMS
from mlp_sdk_v3.config import ConfigurationManager
# Load key from KMS
key = ConfigurationManager.load_key_from_kms(
key_id="arn:aws:kms:REGION:ACCOUNT-ID:key/KEY-ID",
region="us-west-2"
)
config_manager = ConfigurationManager(
config_path="encrypted-config.yaml",
encryption_key=key
)
Encrypting Configuration Files
from mlp_sdk_v3.config import ConfigurationManager
# Generate or load encryption key
key = ConfigurationManager.generate_key()
# Create configuration manager
config_manager = ConfigurationManager(encryption_key=key)
# Encrypt specific fields in configuration file
config_manager.encrypt_config_file(
input_path="plain-config.yaml",
output_path="encrypted-config.yaml",
fields_to_encrypt=[
"defaults.iam.execution_role",
"defaults.kms.key_id"
]
)
Decrypting Configuration Files
from mlp_sdk_v3.config import ConfigurationManager
# Load encryption key
key = ConfigurationManager.load_key_from_env()
# Create configuration manager
config_manager = ConfigurationManager(encryption_key=key)
# Decrypt specific fields
config_manager.decrypt_config_file(
input_path="encrypted-config.yaml",
output_path="decrypted-config.yaml",
fields_to_decrypt=[
"defaults.iam.execution_role",
"defaults.kms.key_id"
]
)
Advanced Usage
Accessing Underlying SageMaker SDK Objects
session = MLP_Session()
# Access SageMaker session
sagemaker_session = session.sagemaker_session
# Access boto3 clients
s3_client = session.boto_session.client('s3')
sagemaker_client = session.sagemaker_client
runtime_client = session.sagemaker_runtime_client
# Get session properties
print(f"Region: {session.region_name}")
print(f"Account ID: {session.account_id}")
print(f"Default bucket: {session.default_bucket}")
Audit Trail
Track all operations for debugging and compliance:
# Initialize session with audit trail enabled (default)
session = MLP_Session(enable_audit_trail=True)
# Perform operations
session.create_feature_group(...)
session.run_processing_job(...)
# Get audit trail entries
entries = session.get_audit_trail(operation="create_feature_group")
print(f"Found {len(entries)} feature group operations")
# Get audit trail summary
summary = session.get_audit_trail_summary()
print(f"Total operations: {summary['total_entries']}")
print(f"Failed operations: {len(summary['failed_operations'])}")
# Export audit trail
session.export_audit_trail("audit-trail.json", format="json")
session.export_audit_trail("audit-trail.csv", format="csv")
Logging Configuration
import logging
# Initialize with custom log level
session = MLP_Session(log_level=logging.DEBUG)
# Change log level at runtime
session.set_log_level(logging.WARNING)
Runtime Configuration Updates
session = MLP_Session()
# Update session configuration at runtime
session.update_session_config(default_bucket="new-bucket-name")
# Get current configuration
config = session.get_config()
print(config)
Error Handling
mlp_sdk_v3 provides detailed error messages with AWS error details:
from mlp_sdk_v3 import MLP_Session, ValidationError, AWSServiceError, ConfigurationError
try:
session = MLP_Session()
feature_group = session.create_feature_group(
feature_group_name="", # Invalid: empty name
...
)
except ValidationError as e:
print(f"Validation error: {e}")
except AWSServiceError as e:
print(f"AWS error: {e}")
print(f"Error code: {e.error_code}")
print(f"Request ID: {e.request_id}")
print(f"Details: {e.get_error_details()}")
except ConfigurationError as e:
print(f"Configuration error: {e}")
API Reference
MLP_Session
Main interface for all mlp_sdk_v3 operations.
Methods
__init__(config_path=None, log_level=logging.INFO, enable_audit_trail=True, **kwargs)- Initialize sessioncreate_feature_group(feature_group_name, record_identifier_name, event_time_feature_name, feature_definitions, **kwargs)- Create feature grouprun_processing_job(job_name, processing_script=None, inputs=None, outputs=None, **kwargs)- Execute processing jobrun_training_job(job_name, training_image, source_code_dir=None, entry_script=None, requirements=None, inputs=None, **kwargs)- Execute training jobcreate_pipeline(pipeline_name, steps, parameters=None, **kwargs)- Create pipelineupsert_pipeline(pipeline, **kwargs)- Create or update pipelinestart_pipeline_execution(pipeline_name, **kwargs)- Start pipeline executionget_config()- Get current configurationget_execution_role()- Get IAM execution roleset_log_level(level)- Set logging levelget_audit_trail(operation=None, status=None, limit=None)- Get audit trail entriesexport_audit_trail(file_path, format='json')- Export audit trail
Properties
sagemaker_session- Underlying SageMaker sessionboto_session- Underlying boto3 sessionsagemaker_client- SageMaker boto3 clientsagemaker_runtime_client- SageMaker Runtime boto3 clientregion_name- AWS region namedefault_bucket- Default S3 bucketaccount_id- AWS account ID
ConfigurationManager
Handles configuration loading and encryption.
Methods
__init__(config_path=None, encryption_key=None)- Initialize configuration managerget_default(key, fallback=None)- Get configuration valuemerge_with_runtime(runtime_config)- Merge runtime parameters with defaultsencrypt_value(plaintext, key=None)- Encrypt a valuedecrypt_value(encrypted, key=None)- Decrypt a valueencrypt_config_file(input_path, output_path, fields_to_encrypt, key=None)- Encrypt configuration filedecrypt_config_file(input_path, output_path, fields_to_decrypt, key=None)- Decrypt configuration file
Static Methods
generate_key()- Generate new encryption keyload_key_from_env(env_var='MLP_SDK_ENCRYPTION_KEY')- Load key from environmentload_key_from_file(file_path)- Load key from fileload_key_from_kms(key_id, region=None)- Load key from AWS KMS
Development
Setup
# Clone the repository
git clone https://github.com/example/mlp_sdk_v3.git
cd mlp_sdk_v3
# Install in development mode with test dependencies
pip install -e ".[dev]"
Testing
# Run all tests
pytest
# Run unit tests only
pytest tests/unit/
# Run property-based tests only
pytest tests/property/
# Run with coverage
pytest --cov=mlp_sdk_v3
# Run specific test file
pytest tests/unit/test_session.py
Code Quality
# Format code
black mlp_sdk_v3 tests
# Sort imports
isort mlp_sdk_v3 tests
# Lint code
flake8 mlp_sdk_v3 tests
# Type checking
mypy mlp_sdk_v3
Requirements
- Python >= 3.8
- sagemaker >= 3.0.0
- boto3 >= 1.26.0
- pyyaml >= 6.0
- pydantic >= 2.0.0
- cryptography >= 41.0.0
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.
Support
For issues, questions, or contributions, please visit our GitHub repository.
Examples
The examples/ directory contains helpful scripts and guides:
- generate_admin_config.py - Generate configuration files
- basic_usage.py - Basic SDK usage examples
- sagemaker_operations.py - SageMaker operations examples
- xgboost_training_example.ipynb - XGBoost training notebook ⭐
- xgboost_training_script.py - XGBoost training script
- QUICKSTART.md - 5-minute quick start guide
- TRAINING_EXAMPLES.md - Detailed training guide
- README.md - Examples documentation
Run examples:
# Generate config
python examples/generate_admin_config.py --interactive
# Run basic examples
python examples/basic_usage.py
# Run SageMaker operations examples
python examples/sagemaker_operations.py
# Run XGBoost training (script)
python examples/xgboost_training_script.py --wait
# Run XGBoost training (notebook)
jupyter notebook examples/xgboost_training_example.ipynb
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sagemaker_mlp_sdk-0.1.1.tar.gz.
File metadata
- Download URL: sagemaker_mlp_sdk-0.1.1.tar.gz
- Upload date:
- Size: 113.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a09a31647017a464c57ce2209d33a4280319389615c538decc1521aa522b5be
|
|
| MD5 |
2ba725431566596c6799be812103a46a
|
|
| BLAKE2b-256 |
6943aebf1f249104a30bff5c8f16fe10d4040a005f8ffd02d062b6f691bbac7f
|
File details
Details for the file sagemaker_mlp_sdk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sagemaker_mlp_sdk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 43.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd425ea82db5755569e1bcf26e64e3b25c979c529ae27d031509d9d30695c53e
|
|
| MD5 |
2200e849fda8c1624c825708242b0d95
|
|
| BLAKE2b-256 |
7fcfd8efbc11744d750ea333ea21a23c87a61009557e17b865f0a7e0875fe81e
|