Skip to main content

GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.

Project description

GridSearcher 𖣯🔍


GridSearcher is a pure Python project designed to simplify the process of running grid searches for Machine Learning projects. It serves as a robust alternative to traditional bash scripts, providing a more flexible and user-friendly way to manage and execute multiple programs in parallel.

⚠️ It is designed for systems where users have direct SSH access to machines and can run their python scripts right away.

Features ✨󠁇󠁇󠁇

  • Grid Search Made Easy: Define parameter grids effortlessly and the cartesian product of your hyper-parameters will be computed automatically and an instance of your script will be run for all possible combinations.
  • Parallel Execution: Run multiple programs concurrently, maximizing your computational resources.
  • GPU Scheduling: Built-in GPU allocation ensures efficient use of available GPUs. Specify the number of GPUs and jobs per GPU, and GridSearcher will handle the rest
  • Flexible Configuration: Easily control the number of parallel jobs and GPU assignments through a scheduling dictionary.
  • Pure Python: No more dealing with complex bash scripts. GridSearcher is written entirely in Python, making it easy to integrate into your existing Python workflows.

Why GridSearcher? 🤔

  • User-Friendly: Simplifies the setup and execution of grid searches, allowing you to focus on your Machine Learning models.
  • Efficient Resource Management: Optimize the use of your GPUs and computational resources.
  • Pythonic Approach: Seamlessly integrates with your Python projects and leverages Python's rich ecosystem.
  • Direct SSH Access: Ideal for systems where users have direct SSH access to machines, providing a straightforward setup and execution process without the need for SLURM or other workload managers, ensuring a smooth and efficient operation.

Installation 🛠️

Install GridSearcher via pip:

pip install gridsearcher

How to use GridSearcher?


We provide a minimal working example in the file example.py. Just set debug=True with debug=False in the run method call to run on GPUs. The output of example.py is the following:

GridSearcher PID: 8940
command 1: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 2: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 3: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 4: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 5: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 6: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 7: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 8: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 9: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 10: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 11: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 12: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23

SBATCH wrapper for SLURM (NEW in version 1.0.4)

We also added a wrapper for SBATCH that allows running SLURM jobs directly from Python!

from gridsearcher import SBATCH

SBATCH(
    script='h100-eval.sh',
    env_vars=dict(
        var1=val1,
        var2=val2,
    ),
    job_name=f'job-name-here',
    nodelist='big-machine', # or None if you don't want to specify --nodelist
    out_err_folder='slurm_output', # the folder where the files output and error will be saved
    ntasks=1,
    cpus_per_task=32,
    time='1:00:00', # change according to your needs
    mem='100G', # change according to your needs
    partition='gpu100', # change according to your needs
    gres='gpu:H100:1' # change according to your needs
).run()

Contribute 🤝


We welcome contributions! If you have suggestions for new features or improvements, feel free to open an issue or submit a pull request.

Versions history:

  • 1.0.4: added SBATCH class, which can be used in a completely separated manner from GridSearcher, allowing running slurm jobs from python
  • 1.0.3: do not check whether the script ends with .py extension anymore
  • 1.0.2: checking the return code of os.system and create file state.finished only if code == 0
  • 1.0.1: added assert statement to make sure that all values in the scheduling["params_values"] are of type list
  • 1.0.0: added initial project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridsearcher-1.0.4.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

gridsearcher-1.0.4-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file gridsearcher-1.0.4.tar.gz.

File metadata

  • Download URL: gridsearcher-1.0.4.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for gridsearcher-1.0.4.tar.gz
Algorithm Hash digest
SHA256 4140dfaee6d607ae2fc49ddcc87427b0f24abc25e32ccf5b01481fdb3e01d79c
MD5 8f3d9f224f324da3a53484fb967422ac
BLAKE2b-256 e4c5c5d46c4bcb173bc31be210724f02ab7478efc1f55009252712dd8b7ee179

See more details on using hashes here.

File details

Details for the file gridsearcher-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: gridsearcher-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for gridsearcher-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cef0ddcaebc699826c3084874f87cb73e58ede607ea4f58b79deaf4f1f154463
MD5 f8dac9b34888c0d3da127fdad97f1feb
BLAKE2b-256 7261211b5a4eb50800b9bfc0e58973104efdf99c492bae4ba06741a0d99a9f95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page