Skip to main content

Tools and runners for deploying and executing Kedro projects on SLURM

Project description

Kedro SLURM

kedro-slurm is a library that integrates Kedro pipelines with SLURM to enable distributed execution of tasks on high-performance computing (HPC) clusters. This library provides seamless integration for defining, submitting, and monitoring jobs on SLURM, leveraging SLURM's job scheduling capabilities while adhering to Kedro's pipeline structure.

INSTALLATION: pip install kedro-slurm

How do I use Kedro SLURM?

To define a SLURM-enabled node, use the kedro_slurm.pipeline.node function. This allows you to specify SLURM resource requirements and job configurations for each node in your pipeline.

from kedro_slurm.pipeline import node


def function(input_data):
    # Your node logic here
    return processed_data


node = node(
    func=function,
    inputs="input_data",
    outputs="processed_data",
    name="my_slurm_node",
    resources=slurm.Resources(cpus=4, memory=16, gpus=1),
    configuration=slurm.Configuration(time_limit="2:00:00", partition_name="gpu"),
)

Define your pipeline by combining SLURM nodes with standard Kedro nodes. Kedro nodes will run using the library's default resource settings.

from kedro.pipeline import Pipeline, node
from kedro_slurm.pipeline import node as slurm_node

pipeline = Pipeline([
    slurm_node(
        func=function_1,
        inputs="input_data",
        outputs="processed_data",
        name="slurm_node_1",
        resources=slurm.Resources(cpus=8, memory=32),
        configuration=slurm.Configuration(time_limit="4:00:00"),
    ),
    node(
        func=function_2,
        inputs="input_data",
        outputs="processed_data",
        name="node_1",
    ),
    # Add more nodes here
])

To run your pipeline on SLURM, use the custom SLURMRunner by executing the following shell command:

kedro run --async --runner=kedro_slurm.runner.SLURMRunner  

Monitoring SLURM Jobs

The library offers abstractions for submitting and monitoring jobs on SLURM. You can submit a SLURM job using the kedro_slurm.slurm.Job class with its submit method, and monitor the job using the kedro_slurm.slurm.Future class.

from kedro_slurm import slurm


resources = slurm.Resources(cpus=8, memory=32, gpus=2)
configuration = slurm.Configuration(time_limit="4:00:00", partition_name="gpu")

job = slurm.Job(
    resources=resources,
    configuration=configuration,
    name="example_job",
    command="python train_model.py",
    path="./logs/%j",
)

future = job.submit()
while not future.done:
    future.update()
    
    print(f"Job status: {future._state}")
    
    time.sleep(5)

A Future can transition through the following job states:

  • RUNNING
  • COMPLETED
  • PENDING
  • FAILED
  • CANCELLED
  • PREEMPTED
  • SUSPENDED
  • STOPPED

Default SLURM Resource Configuration

_DEFAULT_RESOURCES = slurm.Resources(cpus=4, memory=10)
_DEFAULT_CONFIGURATION = slurm.Configuration(time_limit="1:00:00")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro_slurm-0.1.5.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

kedro_slurm-0.1.5-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file kedro_slurm-0.1.5.tar.gz.

File metadata

  • Download URL: kedro_slurm-0.1.5.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.12.9 Darwin/23.6.0

File hashes

Hashes for kedro_slurm-0.1.5.tar.gz
Algorithm Hash digest
SHA256 e103894bb027f14808358eac1eb134d9247be5531148ab24983ba71825759a6b
MD5 fc2810791d26bf1adb12f6649f99a571
BLAKE2b-256 a0784ce051fc13de9db20236ba1635a12da4a7687fc0683f766059cb8bdd403d

See more details on using hashes here.

File details

Details for the file kedro_slurm-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: kedro_slurm-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.12.9 Darwin/23.6.0

File hashes

Hashes for kedro_slurm-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e29887f20c54ff52a196ad89cb73adb3344742aaa4e6c990e2a7731d522cbfb3
MD5 9a302bb481b526d624cccd872239a532
BLAKE2b-256 1f6f67f463dcec36cd039b3998010cd0fa8828a995fc962ef08919ac756782e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page