Skip to main content

A SageMaker-compatible BundledScriptProcessor for running tar-bundled source dirs.

Project description

Bundled Script Processor

An extension of the Amazon SageMaker ScriptProcessor that adds support for bundling a local source_dir (and optional dependencies) into a tarball, uploading it to S3, and running it inside SageMaker Processing jobs. This makes it easier to organize your code into directories and run it in SageMaker without manually managing uploads.


✨ Features

  • Extends ScriptProcessor with source_dir support (instead of just a single script)
  • Supports bundling dependencies, i.e. local folders
  • Automatically generates a lightweight entrypoint script, i.e. runproc.sh
  • Cleans up temporary artifacts after execution

🔍 How it works under the hood

BundledScriptProcessor extends the normal ScriptProcessor flow by injecting an extra packaging step before execution.

  1. Bundle creation – It takes your source_dir (and any extra dependencies) and compresses them into a sourcedir.tar.gz.

  2. Upload to S3 – This tarball is uploaded to your SageMaker default bucket and mounted in the container as a ProcessingInput named "code".

  3. Custom entrypoint – A small runproc.sh script is generated and uploaded as a second ProcessingInput named "entrypoint". This script:

    • Unpacks sourcedir.tar.gz

    • Cleans up the archive

    • Executes your Python entrypoint (main.py by default) with the specified command (e.g. ["python3"]) and any additional arguments.

  4. Entrypoint override – Finally, it overrides the default ScriptProcessor entrypoint to point to this generated shell script, so SageMaker runs it automatically when the job starts.

This design keeps the upload/extract/execute logic transparent to you, while still relying on SageMaker’s standard ProcessingJob mechanics. Additionally, it builds on the existing SageMaker ScriptProcessor API for tasks like compressing and uploading code to S3.

📦 Installation

pip install bundled-script-processor

🚀 Usage

Example directory layout

demo_bundled_script_processor/
├─ main.py
├─ task/
│  ├─ callable.py
│  └─ helper.py
├─ common/
│  └─ lib.py

main.py

from bundled_script_processor import BundledScriptProcessor
from sagemaker import Session, get_execution_role

sm_session = Session()
role = get_execution_role(sagemaker_session=sm_session)

script = 'callable.py'
source_dir = f'/home/pmaslov/demo_bundled_script_processor/task'  # full path
dep1 = f'/home/pmaslov/demo_bundled_script_processor/common'      # full path 


processor = BundledScriptProcessor(
    role=role,
    image_uri="123456789012.dkr.ecr.eu-central-1.amazonaws.com/my-image:latest",
    instance_type="ml.m4.xlarge"
)

# Run with a full source directory
processor.run(
    source_dir=source_dir,                # source_dir must contain callable.py (will be copied into /opt/ml/processing/input/code/)
    code=script,                          # python callable (python file name) to be executed inside ScriptProcessor
    dependencies=[dep1],                  # optional dependency (folder will be copied into /opt/ml/processing/input/code/)
    arguments=["--hello", "world"]        # optional CLI args
)

task/callable.py

from helper import helloworld
from common.lib import common_helloworld

if __name__ == '__main__':
    print(helloworld())
    print(common_helloworld())

task/helper.py

def helloworld():
    return 'Hello World!'

common/lib.py

def common_helloworld():
    return 'Common Hello World!'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundled_script_processor-0.3.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bundled_script_processor-0.3.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file bundled_script_processor-0.3.0.tar.gz.

File metadata

File hashes

Hashes for bundled_script_processor-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d9f4ca70dff1b4d9065dbcb6e24732a2292cf14bd8bf6e7065443cb03652902f
MD5 17c6e3c074cf61e2fa466edf92a07ab3
BLAKE2b-256 4ad2e87d0e823621bf60e9731a9f162b896eb02e156cc77793e4f87992b56a1e

See more details on using hashes here.

File details

Details for the file bundled_script_processor-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bundled_script_processor-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7b26f66e8c24e4ec9f003bdc61880f955c1b8c1424694dadade6ae0d83edb03
MD5 bc9d2b31dab9bb8b0f5a4fa2f355db7e
BLAKE2b-256 7db5b4a26713f99250ed2143ce123a81696733a0733405b50518150178b5bb3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page