A lightweight machine learning experiment scheduler that automates resource management (e.g., GPUs and models) and batch runs experiments with just a few lines of Python code.
Project description
ml_scheduler
ML Scheduler is a lightweight machine learning experiment scheduler that automates resource management (e.g., GPUs and models) and batch runs experiments with just a few lines of Python code.
Quick Start
- Install ml-scheduler
pip install ml-scheduler
or install from the github repository:
git clone https://github.com/huyiwen/ml_scheduler
cd ml_scheduler
pip install -e .
- Create a Python script:
cuda = ml_scheduler.pools.CUDAPool([0, 2], 90)
disk = ml_scheduler.pools.DiskPool('/one-fs')
@ml_scheduler.exp_func
async def mmlu(exp: ml_scheduler.Exp, model, checkpoint):
source_dir = f"/another-fs/model/{model}/checkpoint-{checkpoint}"
target_dir = f"/one-fs/model/{model}-{checkpoint}"
# resources will be cleaned up after exiting the function
disk_resource = await exp.get(
disk.copy_folder,
source_dir,
target_dir,
cleanup_target=True,
)
cuda_resource = await exp.get(cuda.allocate, 1)
# run inference
args = [
"python", "inference.py", "--model", target_dir, "--dataset", "mmlu", "--cuda", str(cuda_resource[0])
]
stdout = await exp.run(args=args)
await exp.report({'Accuracy', stdout})
mmlu.run_csv("experiments.csv", ['Accuracy'])
Mark the function with @ml_scheduler.exp_func
and async
to make it an experiment function. The function should take an exp
argument as the first argument.
Then use await exp.get
to get resources (non-blocking) and await exp.run
to run the experiment (also non-blocking). Non-blocking means that when you can run multiple experiments concurrently.
- Create a CSV file
experiments.csv
with your arguments (model
andcheckpoint
in this case):
model,checkpoint
alpacaflan-packing,200
alpacaflan-packing,400
alpacaflan-qlora,200-merged
alpacaflan-qlora,400-merged
- Run the script:
python run.py
The results (Accuracy
in this case) and some other information will be saved in results.csv
.
More Examples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ml_scheduler-1.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d63a56cc329b6b555a663ed3a8bea843d9fd8fe50d808338fc70922c28a123f8 |
|
MD5 | ef3fb50d45adb2639c6d560cd43f79f4 |
|
BLAKE2b-256 | d2a5e0a50ce29145f0bd7d12f032a35a0a7070c1c81b00bdf5aeb183d2652e3a |