Skip to main content

GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.

Project description

GridSearcher 𖣯🔍


GridSearcher is a pure Python project designed to simplify the process of running grid searches for Machine Learning projects. It serves as a robust alternative to traditional bash scripts, providing a more flexible and user-friendly way to manage and execute multiple programs in parallel.

⚠️ It is designed for systems where users have direct SSH access to machines and can run their python scripts right away.

Features ✨󠁇󠁇󠁇

  • Grid Search Made Easy: Define parameter grids effortlessly and the cartesian product of your hyper-parameters will be computed automatically and an instance of your script will be run for all possible combinations.
  • Parallel Execution: Run multiple programs concurrently, maximizing your computational resources.
  • GPU Scheduling: Built-in GPU allocation ensures efficient use of available GPUs. Specify the number of GPUs and jobs per GPU, and GridSearcher will handle the rest
  • Flexible Configuration: Easily control the number of parallel jobs and GPU assignments through a scheduling dictionary.
  • Pure Python: No more dealing with complex bash scripts. GridSearcher is written entirely in Python, making it easy to integrate into your existing Python workflows.

Why GridSearcher? 🤔

  • User-Friendly: Simplifies the setup and execution of grid searches, allowing you to focus on your Machine Learning models.
  • Efficient Resource Management: Optimize the use of your GPUs and computational resources.
  • Pythonic Approach: Seamlessly integrates with your Python projects and leverages Python's rich ecosystem.
  • Direct SSH Access: Ideal for systems where users have direct SSH access to machines, providing a straightforward setup and execution process without the need for SLURM or other workload managers, ensuring a smooth and efficient operation.

Installation 🛠️

Install GridSearcher via pip:

pip install gridsearcher

How to use GridSearcher?


We provide a minimal working example in the file example.py. Just set debug=True with debug=False in the run method call to run on GPUs. The output of example.py is the following:

GridSearcher PID: 8940
command 1: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 2: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 3: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 4: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=1_2024-06-19_23-04-23 --seed 1 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=1_2024-06-19_23-04-23
command 5: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 6: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 7: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 8: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=2_2024-06-19_23-04-23 --seed 2 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=2_2024-06-19_23-04-23
command 9: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-2 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 10: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-2 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-2_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 11: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-3 --wd 1e-2 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-2_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23
command 12: python3 myscript.py --batch_size 128 --epochs 100 --lr_decay_at 82 123 --wandb_project cifar10-training --wandb_group cifar10_rn18_adamw_E=100_bs=128 --wandb_job_type lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8 --wandb_name seed=3_2024-06-19_23-04-23 --seed 3 --lr 1e-3 --wd 1e-3 --beta1 0.9 --beta2 0.999 --eps 1e-8 --root_folder ./results/cifar10-training/cifar10_rn18_adamw_E=100_bs=128/lr=1e-3_wd=1e-3_beta1=0.9_beta2=0.999_eps=1e-8/seed=3_2024-06-19_23-04-23

SBATCH wrapper for SLURM (NEW in version 1.0.4)

We also added a wrapper for SBATCH that allows running SLURM jobs directly from Python!

from gridsearcher import SBATCH

SBATCH(
    script='h100-eval.sh',
    env_vars=dict(
        var1=val1,
        var2=val2,
    ),
    sbatch_args=dict(
        job_name=f'job-name-here',
        nodelist='big-machine', # or None if you don't want to specify --nodelist
        out_err_folder='slurm_output', # the folder where the files output and error will be saved
        ntasks=1,
        cpus_per_task=32,
        time='1:00:00', # change according to your needs
        mem='100G', # change according to your needs
        partition='gpu100', # change according to your needs
        gres='gpu:H100:1' # change according to your needs
    )
).run()

Contribute 🤝


We welcome contributions! If you have suggestions for new features or improvements, feel free to open an issue or submit a pull request.

Versions history:

  • 1.1.0: removed specific arguments and replaced them with dictionaries to offer flexibility to use any SBATCH params
  • 1.0.4: added SBATCH class, which can be used in a completely separated manner from GridSearcher, allowing running slurm jobs from python
  • 1.0.3: do not check whether the script ends with .py extension anymore
  • 1.0.2: checking the return code of os.system and create file state.finished only if code == 0
  • 1.0.1: added assert statement to make sure that all values in the scheduling["params_values"] are of type list
  • 1.0.0: added initial project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridsearcher-1.1.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

gridsearcher-1.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file gridsearcher-1.1.0.tar.gz.

File metadata

  • Download URL: gridsearcher-1.1.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for gridsearcher-1.1.0.tar.gz
Algorithm Hash digest
SHA256 84b63d6850bc30badd9182ef812717448335fbd2d090ddba156c47cd38b91ecf
MD5 27e9f959284f6219783ee04fa21027a7
BLAKE2b-256 f6b9d15a515e486d7c4399b1042daaf537cc8c320990ae8ddb8fb787b35d7c20

See more details on using hashes here.

File details

Details for the file gridsearcher-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: gridsearcher-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for gridsearcher-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c41a9bd962a13495188132560bf47b09732094593d70ee9192bb6c60fa96c48d
MD5 df666d19bf5ddae90f4e038adf1d820c
BLAKE2b-256 311227593efccb6e829ee032e8f42b2a27ecfaf22ab71d527fc831dc04cea915

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page