Skip to main content

A SageMaker-compatible BundledScriptProcessor for running tar-bundled source dirs.

Project description

Bundled Script Processor

An extension of the Amazon SageMaker ScriptProcessor that adds support for bundling a local source_dir (and optional dependencies) into a tarball, uploading it to S3, and running it inside SageMaker Processing jobs. This makes it easier to organize your code into directories and run it in SageMaker without manually managing uploads.

🔍 How it works under the hood

BundledScriptProcessor extends the normal ScriptProcessor flow by injecting an extra packaging step before execution.

  1. Bundle creation – It takes your source_dir (and any extra dependencies) and compresses them into a sourcedir.tar.gz.
  2. Upload to S3 – This tarball is uploaded to your SageMaker default bucket and mounted in the container as a ProcessingInput named "code".
  3. Custom entrypoint – A small runproc.sh script is generated and uploaded as a second ProcessingInput named "entrypoint". This script: • Unpacks sourcedir.tar.gz • Cleans up the archive • Executes your Python entrypoint (main.py by default) with the specified command (e.g. ["python3"]) and any additional arguments.
  4. Entrypoint override – Finally, it overrides the default ScriptProcessor entrypoint to point to this generated shell script, so SageMaker runs it automatically when the job starts.

This design keeps the upload/extract/execute logic transparent to you, while still relying on SageMaker’s standard ProcessingJob mechanics. Additionally, it builds on the existing SageMaker ScriptProcessor API for tasks like compressing and uploading code to S3.


✨ Features

  • Extends ScriptProcessor with source_dir support
  • Accepts a source directory instead of just a single script
  • Supports bundling dependencies / local folders
  • Automatically generates a lightweight entrypoint script, i.e. runproc.sh
  • Cleans up temporary artifacts after execution

📦 Installation

pip install bundled-script-processor

🚀 Usage

Example directory layout

demo_bundled_script_processor/
├─ main.py
├─ task/
│  ├─ callable.py
│  └─ helper.py
├─ common/
│  └─ lib.py

main.py

from bundled_script_processor import BundledScriptProcessor
from sagemaker import Session, get_execution_role

sm_session = Session()
role = get_execution_role(sagemaker_session=sm_session)

script = 'callable.py'
source_dir = f'/home/pmaslov/demo_bundled_script_processor/task'
dep1 = f'/home/pmaslov/demo_bundled_script_processor/common'


processor = BundledScriptProcessor(
    role=role,
    image_uri="123456789012.dkr.ecr.eu-central-1.amazonaws.com/my-image:latest",
    instance_type="ml.m4.xlarge"
)

# Run with a full source directory
processor.run(
    source_dir=source_dir,                # source_dir must contain callable.py (will be copied into /opt/ml/processing/input/code/)
    code=script,                          # python callable (python file name) to be executed inside ScriptProcessor
    dependencies=[dep1],                  # optional dependency (folder will be copied into /opt/ml/processing/input/code/)
    arguments=["--hello", "world"]        # optional CLI args
)

task/callable.py

from helper import helloworld
from common.lib import common_helloworld

if __name__ == '__main__':
    print(helloworld())
    print(common_helloworld())

task/helper.py

def helloworld():
    return 'Hello World!'

common/lib.py

def common_helloworld():
    return 'Common Hello World!'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundled_script_processor-0.2.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bundled_script_processor-0.2.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file bundled_script_processor-0.2.0.tar.gz.

File metadata

File hashes

Hashes for bundled_script_processor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5b91d0b8cc35189258a364f7c2ad9e560a7006a567129d4e5a9430b861b2174e
MD5 749f11165a4f283de718eec3000c712b
BLAKE2b-256 a46b2010eae3c2de0664e59025d449d0324c9aa4ec621e4cf26fefc87d62ff94

See more details on using hashes here.

File details

Details for the file bundled_script_processor-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bundled_script_processor-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7748608b11104d48455826025fa7a232d19d99feb078dc158d423388c89bc66
MD5 41c2fd3b559678e04e455b0fc6c37316
BLAKE2b-256 d9b0024821d3f082a603aa8365410dd1aab4b36bb63ddabb39f0916a7c019cd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page