Tools and runners for deploying and executing Kedro projects on SLURM
Project description
Kedro SLURM
kedro-slurm is a library that integrates Kedro pipelines with SLURM to enable distributed execution of tasks on high-performance computing (HPC) clusters. This library provides seamless integration for defining, submitting, and monitoring jobs on SLURM, leveraging SLURM's job scheduling capabilities while adhering to Kedro's pipeline structure.
INSTALLATION: pip install kedro-slurm
How do I use Kedro SLURM?
To define a SLURM-enabled node, use the kedro_slurm.pipeline.node
function.
This allows you to specify SLURM resource requirements and job configurations for each node in your pipeline.
from kedro_slurm.pipeline import node
def function(input_data):
# Your node logic here
return processed_data
node = node(
func=function,
inputs="input_data",
outputs="processed_data",
name="my_slurm_node",
resources=slurm.Resources(cpus=4, memory=16, gpus=1),
configuration=slurm.Configuration(time_limit="2:00:00", partition_name="gpu"),
)
Define your pipeline by combining SLURM nodes with standard Kedro nodes. Kedro nodes will run using the library's default resource settings.
from kedro.pipeline import Pipeline, node
from kedro_slurm.pipeline import node as slurm_node
pipeline = Pipeline([
slurm_node(
func=function_1,
inputs="input_data",
outputs="processed_data",
name="slurm_node_1",
resources=slurm.Resources(cpus=8, memory=32),
configuration=slurm.Configuration(time_limit="4:00:00"),
),
node(
func=function_2,
inputs="input_data",
outputs="processed_data",
name="node_1",
),
# Add more nodes here
])
To run your pipeline on SLURM, use the custom SLURMRunner by executing the following shell command:
kedro run --async --runner=kedro_slurm.runner.SLURMRunner
Monitoring SLURM Jobs
The library offers abstractions for submitting and monitoring jobs on SLURM.
You can submit a SLURM job using the kedro_slurm.slurm.Job
class with its submit method, and monitor the job using the kedro_slurm.slurm.Future
class.
from kedro_slurm import slurm
resources = slurm.Resources(cpus=8, memory=32, gpus=2)
configuration = slurm.Configuration(time_limit="4:00:00", partition_name="gpu")
job = slurm.Job(
resources=resources,
configuration=configuration,
name="example_job",
command="python train_model.py",
path="./logs/%j",
)
future = job.submit()
while not future.done:
future.update()
print(f"Job status: {future._state}")
time.sleep(5)
A Future can transition through the following job states:
- RUNNING
- COMPLETED
- PENDING
- FAILED
- CANCELLED
- PREEMPTED
- SUSPENDED
- STOPPED
Default SLURM Resource Configuration
_DEFAULT_RESOURCES = slurm.Resources(cpus=4, memory=10)
_DEFAULT_CONFIGURATION = slurm.Configuration(time_limit="1:00:00")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kedro_slurm-0.1.5.tar.gz
.
File metadata
- Download URL: kedro_slurm-0.1.5.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.9 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e103894bb027f14808358eac1eb134d9247be5531148ab24983ba71825759a6b
|
|
MD5 |
fc2810791d26bf1adb12f6649f99a571
|
|
BLAKE2b-256 |
a0784ce051fc13de9db20236ba1635a12da4a7687fc0683f766059cb8bdd403d
|
File details
Details for the file kedro_slurm-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: kedro_slurm-0.1.5-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.9 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e29887f20c54ff52a196ad89cb73adb3344742aaa4e6c990e2a7731d522cbfb3
|
|
MD5 |
9a302bb481b526d624cccd872239a532
|
|
BLAKE2b-256 |
1f6f67f463dcec36cd039b3998010cd0fa8828a995fc962ef08919ac756782e3
|