Skip to main content

No project description provided

Project description

aws-sagemaker-remote

Remotely run and track ML research using AWS SageMaker.

  • Standardized command line flags

  • Remotely run scripts with minimal changes

  • Automatically manage AWS resources

  • All code, inputs, outputs, arguments, and settings are tracked in one place

  • Reproducible batch processing jobs to prepare datasets

  • Reproducible training jobs that track hyperparameters and metrics

Track three types of objects in a standard way:

  • Processing jobs consume file inputs and produce file outputs. Useful for data conversion, extraction, etc.

  • Training jobs train models while tracking metrics and hyperparameters.

  • Inference models provide predictions and can be deployed on endpoints. Can be automatically created from and linked to training jobs for tracking purposes or can deploy externally-created models.

Installation

Release

pip install aws-sagemaker-remote

Development

git clone https://github.com/bstriner/aws-sagemaker-remote
cd aws-sagemaker-remote
python setup.py develop

Processing

Processing jobs accept a set of one or more input file paths and write to a set of one or more output file paths. Ideal for file conversion or other data preparation tasks.

  • Running locally, standard command line arguments for inputs and outputs are used as usual

  • Running remotely, data is uploaded and downloaded using S3 for tracking

Basic usage

Write a script with a main function that calls sagemaker_processing_main.

from aws_sagemaker_remote.processing import sagemaker_processing_main

def main(args):
    # your code here
    pass

if __name__ == '__main__':
    sagemaker_processing_main(
        script=__file__,
        main=main,
        # ...
    )

Pass function argument run=True or command line argument --sagemaker-run=True to run script remotely on SageMaker.

Processing Job Tracking

Use the SageMaker console to view a list of all processing jobs. For each job, SageMaker tracks:

  • Processing time

  • Container used

  • Link to CloudWatch logs

  • Path on S3 for each of:

    • Script file

    • Each input channel

    • Each output channel

    • Requirements file (if used)

    • Configuration script (if used)

    • Supporting code (if used)

Configuration

Many command line options are added by this command.

Option --sagemaker-run controls local or remote execution.

  • Set --sagemaker-run to a falsy value (no,false,0), the script will call your main function as usual and run locally.

  • Set --sagemaker-run to a truthy value (yes,true,1), the script will upload itself and any requirements or inputs to S3, execute remotely on SageMaker, and save outputs to S3, logging results to the terminal.

Set --sagemaker-wait truthy to tail logs and wait for completion or falsy to complete when the job starts.

Defaults are set through code. Defaults can be overwritten on the command line. For example:

  • Use the function argument image to set the default container image for your script

  • Use the command line argument --sagemaker-image to override the container image on a particular run

See functions and commands (todo: links)

Environment Customization

The environment can be customized in multiple ways.

  • Instance

    • Function argument instance

    • Command line argument --sagemaker-instance

    • Select instance type of machine running the container

  • Image

    • Function argument image

    • Command line argument --sagemaker-image

    • Accepts URI of Docker container image on ECR to run

    • Build a custom Docker image for major customizations

  • Configuration script

    • Function argument configuration_script

    • Command line argument --sagemaker-configuration-script

    • Accepts path to a text file. Will upload text file to S3 and run source [file].

    • Batch script for minor customization, e.g., export MYVAR=value or yum install -y mypackage

  • Requirements file

  • Module uploads

    • Function argument modules

      • Dictionary of [key]->[value]

      • Each key will create command line argument --key that defaults to value

    • Each value is a directory containing a Python module that will be uploaded to S3, downloaded to SageMaker, and put on the PYTHONPATH

    • For example, if directory mymodule contains the files __init__.py and myfile.py and myfile.py contains def myfunction():..., pass modules={'mymodule':'path/to/mymodule'} to sagemaker_processing_main and then use from mymodule.myfile import myfunction in your script.

    • Use module uploads for supporting code that is not being installed from packages.

Additional arguments

Any arguments passed to your script locally on the command line are passed to your script remotely and tracked by SageMaker. Internally, sagemaker_processing_main uses argparse. To add additional command-line flags:

  • Pass a list of kwargs dictionaries to additional_arguments

    sagemaker_processing_main(
      #...
      additional_arguments = [
        {
          'dest': '--filter-width',
          'default':32,
          'help':'Filter width'
        },
        {
          'dest':'--filter-height',
          'default':32,
          'help':'Filter height'
        }
      ]
    )
  • Pass a callback to argparse_callback

    from argparse import ArgumentParser
    def argparse_callback(parser:ArgumentParser):
      parser.add_argument(
      '--filter-width',
      default=32,
      help='Filter width')
      parser.add_argument(
      '--filter-height',
      default=32,
      help='Filter height')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aws-sagemaker-remote-0.0.1.tar.gz (13.0 kB view details)

Uploaded Source

File details

Details for the file aws-sagemaker-remote-0.0.1.tar.gz.

File metadata

  • Download URL: aws-sagemaker-remote-0.0.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for aws-sagemaker-remote-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a8a4648da63020b1632b6bd4e0eda140e47fc3bf2d751e19e90f99f17c1c5616
MD5 8902e0307ffe6b2527ed8e80a33fa8d0
BLAKE2b-256 aca637253a08b3d9798437ecf7780efbb7333fc99f97dd5362dab9d3823bb478

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page