Skip to main content

Python classes for organizing (HPC) simulations

Project description

py-modelrunner

Build status Documentation Status PyPI version Conda Version

License: MIT codecov Language grade: Python Code style: black

This package provides python classes for handling and running physical simulations. The main aim is to easily wrap simulation code and deal with input and output automatically. The package also facilitates submitting simulations to high performance computing environments and it provides functions for running parameter scans.

Installation

The package can simply be cloned from github, but it is also available on pip and conda:

pip install py-modelrunner

Usage

This package has multiple purposes that are described in more detail below. Additional examples can be found in the examples folder.

Minimal example

Assume you have written a python simulation in form of a simple script that defines a function with several arguments, like so

def main(a: float = 1, b: int = 2, negate: bool = False):
    res = a ** b
    if negate:
    	res *= -1
    return res

The modelrunner package now allows you to wrap a convenient command line interface around this simple function. Assuming the script is saved in a file called script.py, calling python -m modelrunner script.py -h shows the follwing help

usage: test.py [-h] [--a VALUE] [--b VALUE] [--negate] [--json JSON] [-o PATH]

optional arguments:
  -h, --help            show this help message and exit
  --json JSON           JSON-encoded parameter values. Overwrites other parameters. (default: None)
  -o PATH, --output PATH
                        Path to output file. If omitted, no output file is created. (default: None)

  --a VALUE             Parameter `a` (default: 1)
  --b VALUE             Parameter `b` (default: 2)
  --negate              Parameter `negate` (default: False)

Consequently, the function can be called using python -m modelrunner script.py --a 2 --b 3 --negate -o result.yaml, which produces a file result.yaml with the following content:

info:
  time: # TIMESTAMP
model:
  class: main
  description: null
  name: main
  parameters:
    a: 2.0
    b: 3
    negate: true
result: -8.0

This file not only contains the result, but also metainformation including the parameters to run the simulation and the time when it was started.

Creating models

The package introduces a base class ModelBase that describes the bare structure all models are supposed to have. Custom models can be created by inheriting from ModelBase and defining suitable parameters:

from modelrunner import ModelBase

class MyModel(ModelBase):  # define custom model

    # defines supported parameters with default values
    parameters_default = {"a": 1, "b": 2}

    def __call__(self):
        """calculate the actual model"""
        return self.parameters["a"] * self.parameters["b"]


model = MyModel({"a" : 3})

The last line actually creates an instance of the model with custom parameters.

Alternatively, a model can also be defined from a simple function:

from modelrunner import make_model

@make_model
def multiply(a=1, b=2):
    return a * b

model = multiply(a=3)

The main aim of defining models like this is to provide a unified interface for running models for the subsequent sections.

Run models from command line

Models can be run with different parameters. In both examples shown above, the model can be run from within the python code by simply calling the model instance: model(). In the cases shown above, these calls will simply return 6.

In typical numerical simulations, models need to be evaluated for many different parameters. The packages facilitates this by providing a special interface to set arguments from the command line. To show this, either one of the model definitions given above can be saved as a python file model.py. Using the special call python -m modelrunner model.py provides a command line interface for adjusting model parameters. The supported parameters can be obtained with the following command

$ python -m modelrunner model.py --help

usage: model.py [-h] [--a VALUE] [--b VALUE] [-o PATH] [--json JSON]

optional arguments:
  -h, --help            show this help message and exit
  -o PATH, --output PATH
                        Path to output file. If omitted, no output file is created. (default: None)
  --json JSON           JSON-encoded parameter values. Overwrites other parameters. (default: None)

  --a VALUE             Parameter `a` (default: 1)
  --b VALUE             Parameter `b` (default: 2)

This can be helpful to call a model automatically and save the result. For instance, by calling python -m modelrunner model.py -h --a 3 -o result.yaml, we obtain a file result.yaml that looks something like this:

model:
  class: multiply
  name: multiply
  parameters:
    a: 3
    b: 2
result: 6

Other supported output formats include JSON (extension .json) and HDF (extension .hdf).

Submit models to an HPC queue

The package also provides methods to submit scripts to an high performance compute (HPC) system. A simple full script displaying this reads

from modelrunner import make_model, submit_job

@make_model
def multiply(a=1, b=2):
    return a * b

if __name__ == "__main__":
    submit_job(__file__, parameters={'a': 2}, output="data.hdf5", method="local")

Here, the output argument specifies a file to which the results are written, while method chooses how the script is submitted.

In particular, this method allows submitting the same script with multiple different parameters to conduct a parameter study:

from modelrunner import make_model, submit_job

@make_model
def multiply(a=1, b=2):
    return a * b

if __name__ == "__main__":
    for a in range(5):
        submit_job(__file__, parameters={'a': a}, output=f"data_{a}.hdf5", method="local")

Note that the safe-guard if __name__ == "__main__" is absolutely crucial to ensure that jobs are only submitted during the initial run and not when the file is imported again when the actual jobs start. It is also important to choose unique file names for the output flag since otherwise different jobs overwrite each others data.

We also support submitting multiple jobs of a parameter study:

from modelrunner import make_model, submit_jobs

@make_model
def multiply(a=1, b=2):
    return a * b

if __name__ == "__main__":
    submit_jobs(__file__, parameters={'a': range(5)}, output_folder="data", method="local")

Finally, the packages also offers a method to submit a model script to the cluster using a simple command: python3 -m modelrunner.run script.py. This command also offers multiple options that can be adjusted using command line arguments:

usage: python -m modelrunner.run [-h] [-n NAME] [-p JSON] [-o PATH] [-f] [-m METHOD] [-t PATH] script

Run a script as a job

positional arguments:
  script                The script that should be run

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --name NAME  Name of job
  -p JSON, --parameters JSON
                        JSON-encoded dictionary of parameters for the model
  -o PATH, --output PATH
                        Path to output file
  -f, --force           Overwrite data if it already exists
  -m METHOD, --method METHOD
                        Method for job submission
  -t PATH, --template PATH
                        Path to template file for submission script

Collating results

Finally, the package also provides some rudimentary support for collection results from many different simulations that have been run in parallel. In particular, the class ResultCollection provides a class method from_folder to scan a folder for result files. For instance, the data from the multiple jobs ran above can be collected using

from modelrunner import ResultCollection

results = ResultCollection.from_folder(".", pattern="data_*.hdf5")
print(results.dataframe)

This example should print all results using a pandas Dataframe, where each row corresponds to a separate simulation.

Development

The package is in an early phase and breaking changes are thus likely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-modelrunner-0.4.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_modelrunner-0.4-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file py-modelrunner-0.4.tar.gz.

File metadata

  • Download URL: py-modelrunner-0.4.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.14

File hashes

Hashes for py-modelrunner-0.4.tar.gz
Algorithm Hash digest
SHA256 dd3041fadb6e889e075088f0a4c4326df4603c4bcdfbd8655bb2e60f37530432
MD5 581a0c023d3ef5d596f9d5a849b7482d
BLAKE2b-256 56c2bfe02cc415dd24a24bcc58f2ca055a800eb11889c1184f018c2159c957c6

See more details on using hashes here.

File details

Details for the file py_modelrunner-0.4-py3-none-any.whl.

File metadata

  • Download URL: py_modelrunner-0.4-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.14

File hashes

Hashes for py_modelrunner-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2e7dea02144a817642ed1f20265c23de770042d8421c28591cb6ff3906b27634
MD5 870c4ec2f0369ce183d7dcf9b461a191
BLAKE2b-256 86842f6a4bb186a3f0e8afdfbd01dad931d7975abed0441c499f3cb3f731f150

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page