Skip to main content

A SageMaker-compatible BundledScriptProcessor for running tar-bundled source dirs.

Project description

Bundled Script Processor

An extension of the Amazon SageMaker ScriptProcessor that adds support for bundling a local source_dir (and optional dependencies) into a tarball, uploading it to S3, and running it inside SageMaker Processing jobs. This makes it easier to organize your code into directories and run it in SageMaker without manually managing uploads.


✨ Features

  • Extends ScriptProcessor with source_dir support (instead of just a single script)
  • Supports bundling dependencies, i.e. local folders
  • Automatically generates a lightweight entrypoint script, i.e. runproc.sh
  • Cleans up temporary artifacts after execution

🔍 How it works under the hood

BundledScriptProcessor extends the normal ScriptProcessor flow by injecting an extra packaging step before execution.

  1. Bundle creation – It takes your source_dir (and any extra dependencies) and compresses them into a sourcedir.tar.gz.

  2. Upload to S3 – This tarball is uploaded to your SageMaker default bucket and mounted in the container as a ProcessingInput named "code".

  3. Custom entrypoint – A small runproc.sh script is generated and uploaded as a second ProcessingInput named "entrypoint". This script:

    • Unpacks sourcedir.tar.gz

    • Cleans up the archive

    • Executes your Python entrypoint (main.py by default) with the specified command (e.g. ["python3"]) and any additional arguments.

  4. Entrypoint override – Finally, it overrides the default ScriptProcessor entrypoint to point to this generated shell script, so SageMaker runs it automatically when the job starts.

This design keeps the upload/extract/execute logic transparent to you, while still relying on SageMaker’s standard ProcessingJob mechanics. Additionally, it builds on the existing SageMaker ScriptProcessor API for tasks like compressing and uploading code to S3.

📦 Installation

pip install bundled-script-processor

🚀 Usage

Example directory layout

demo_bundled_script_processor/
├─ main.py
├─ task/
│  ├─ callable.py
│  └─ helper.py
├─ common/
│  └─ lib.py

main.py

from bundled_script_processor import BundledScriptProcessor
from sagemaker import Session, get_execution_role

sm_session = Session()
role = get_execution_role(sagemaker_session=sm_session)

script = 'callable.py'
source_dir = f'/home/pmaslov/demo_bundled_script_processor/task'  # full path
dep1 = f'/home/pmaslov/demo_bundled_script_processor/common'      # full path 


processor = BundledScriptProcessor(
    role=role,
    image_uri="123456789012.dkr.ecr.eu-central-1.amazonaws.com/my-image:latest",
    instance_type="ml.m4.xlarge"
)

# Run with a full source directory
processor.run(
    source_dir=source_dir,                # source_dir must contain callable.py (will be copied into /opt/ml/processing/input/code/)
    code=script,                          # python callable (python file name) to be executed inside ScriptProcessor
    dependencies=[dep1],                  # optional dependency (folder will be copied into /opt/ml/processing/input/code/)
    arguments=["--hello", "world"]        # optional CLI args
)

task/callable.py

from helper import helloworld
from common.lib import common_helloworld

if __name__ == '__main__':
    print(helloworld())
    print(common_helloworld())

task/helper.py

def helloworld():
    return 'Hello World!'

common/lib.py

def common_helloworld():
    return 'Common Hello World!'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundled_script_processor-0.4.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bundled_script_processor-0.4.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file bundled_script_processor-0.4.0.tar.gz.

File metadata

File hashes

Hashes for bundled_script_processor-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2f66990b9a60f01343e1246bdc9dbed2870ebeaffffac6f3a5cbf07f5eaad6e8
MD5 ace5bfef17d1b2c09d6beff9f9363ca1
BLAKE2b-256 b57e00a3118f486709b3ab303c58d077d9fb6aeb4242b3fcd0c663d1c0b6b770

See more details on using hashes here.

File details

Details for the file bundled_script_processor-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bundled_script_processor-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e339dbfa1b285fc6c19c5db5a570c336048d85dba1e56acfc887642ed122618
MD5 e150628408070e1190f3d6cb6ee7b923
BLAKE2b-256 1f1a46f95befce30f34d7edf03526fbded50ddc99d227c2949b141b2b09f86eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page