Skip to main content

A SageMaker-compatible BundledScriptProcessor for running tar-bundled source dirs.

Project description

Bundled Script Processor

An extension of the Amazon SageMaker ScriptProcessor that adds support for bundling a local source_dir (and optional dependencies) into a tarball, uploading it to S3, and running it inside SageMaker Processing jobs. This makes it easier to organize your code into directories and run it in SageMaker without manually managing uploads.


✨ Features

  • Extends ScriptProcessor with source_dir support
  • Accepts a source directory instead of just a single script
  • Supports bundling dependencies / local folders
  • Automatically generates a lightweight entrypoint script, i.e. runproc.sh
  • Cleans up temporary artifacts after execution

🔍 How it works under the hood

BundledScriptProcessor extends the normal ScriptProcessor flow by injecting an extra packaging step before execution.

  1. Bundle creation – It takes your source_dir (and any extra dependencies) and compresses them into a sourcedir.tar.gz.
  2. Upload to S3 – This tarball is uploaded to your SageMaker default bucket and mounted in the container as a ProcessingInput named "code".
  3. Custom entrypoint – A small runproc.sh script is generated and uploaded as a second ProcessingInput named "entrypoint". This script: • Unpacks sourcedir.tar.gz • Cleans up the archive • Executes your Python entrypoint (main.py by default) with the specified command (e.g. ["python3"]) and any additional arguments.
  4. Entrypoint override – Finally, it overrides the default ScriptProcessor entrypoint to point to this generated shell script, so SageMaker runs it automatically when the job starts.

This design keeps the upload/extract/execute logic transparent to you, while still relying on SageMaker’s standard ProcessingJob mechanics. Additionally, it builds on the existing SageMaker ScriptProcessor API for tasks like compressing and uploading code to S3.

📦 Installation

pip install bundled-script-processor

🚀 Usage

Example directory layout

demo_bundled_script_processor/
├─ main.py
├─ task/
│  ├─ callable.py
│  └─ helper.py
├─ common/
│  └─ lib.py

main.py

from bundled_script_processor import BundledScriptProcessor
from sagemaker import Session, get_execution_role

sm_session = Session()
role = get_execution_role(sagemaker_session=sm_session)

script = 'callable.py'
source_dir = f'/home/pmaslov/demo_bundled_script_processor/task'
dep1 = f'/home/pmaslov/demo_bundled_script_processor/common'


processor = BundledScriptProcessor(
    role=role,
    image_uri="123456789012.dkr.ecr.eu-central-1.amazonaws.com/my-image:latest",
    instance_type="ml.m4.xlarge"
)

# Run with a full source directory
processor.run(
    source_dir=source_dir,                # source_dir must contain callable.py (will be copied into /opt/ml/processing/input/code/)
    code=script,                          # python callable (python file name) to be executed inside ScriptProcessor
    dependencies=[dep1],                  # optional dependency (folder will be copied into /opt/ml/processing/input/code/)
    arguments=["--hello", "world"]        # optional CLI args
)

task/callable.py

from helper import helloworld
from common.lib import common_helloworld

if __name__ == '__main__':
    print(helloworld())
    print(common_helloworld())

task/helper.py

def helloworld():
    return 'Hello World!'

common/lib.py

def common_helloworld():
    return 'Common Hello World!'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundled_script_processor-0.2.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bundled_script_processor-0.2.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file bundled_script_processor-0.2.1.tar.gz.

File metadata

File hashes

Hashes for bundled_script_processor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8cbbfaddcd34e3c781d7bd37c5fc4fddd091a997f50105014c3f91442d3ad76a
MD5 3cc8f4d1990c9a8b543dada10a4892fa
BLAKE2b-256 5992d798f0781017af40c4df67734052acbe1a4599e4957b1902f612651ef136

See more details on using hashes here.

File details

Details for the file bundled_script_processor-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bundled_script_processor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f1f4fdfac84e5bf22d90ee330fae81469200b829cff77db89ee8533277659f7
MD5 994fda4dcd422ef03572166b6a94c434
BLAKE2b-256 988061b98598690f89b21f567ffb645d80a99a32502abde289d146e61f6a9d80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page