Skip to main content

loop like a pro, make parameter studies fun

Project description

About

This is a package with simple helpers to set up and run parameter studies.

Getting started

Loop over two parameters ‘a’ and ‘b’:

#!/usr/bin/env python3

import random
from itertools import product
from psweep import psweep as ps


def func(pset):
    return {'result': random.random() * pset['a'] * pset['b']}


if __name__ == '__main__':
    a = ps.seq2dicts('a', [1,2,3,4])
    b = ps.seq2dicts('b', [8,9])
    params = ps.loops2params(product(a,b))
    df = ps.run(func, params)
    print(df)

This produces a list of parameter sets to loop over (params):

[{'a': 1, 'b': 8},
 {'a': 1, 'b': 9},
 {'a': 2, 'b': 8},
 {'a': 2, 'b': 9},
 {'a': 3, 'b': 8},
 {'a': 3, 'b': 9},
 {'a': 4, 'b': 8},
 {'a': 4, 'b': 9}]

and a database of results (pandas DataFrame df, pickled file calc/results.pk by default):

                           _calc_dir                              _pset_id  \
2018-07-22 20:06:07.401398      calc  99a0f636-10b3-438c-ab43-c583fda806e8
2018-07-22 20:06:07.406902      calc  6ec59d2b-7562-4262-b8d6-8f898a95f521
2018-07-22 20:06:07.410227      calc  d3c22d7d-bc6d-4297-afc3-285482e624b5
2018-07-22 20:06:07.412210      calc  f2b2269b-86e3-4b15-aeb7-92848ae25f7b
2018-07-22 20:06:07.414637      calc  8e1db575-1be2-4561-a835-c88739dc0440
2018-07-22 20:06:07.416465      calc  674f8a2c-bc21-40f4-b01f-3702e0338ae8
2018-07-22 20:06:07.418866      calc  b4d3d11b-0f22-4c73-a895-7363c635c0c6
2018-07-22 20:06:07.420706      calc  a265ca2f-3a9f-4323-b494-4b6763c46929

                                                         _run_id  \
2018-07-22 20:06:07.401398  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.406902  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.410227  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.412210  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.414637  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.416465  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.418866  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f
2018-07-22 20:06:07.420706  3e09daf8-c3a7-49cb-8aa3-f2c040c70e8f

                                            _time_utc  a  b     result
2018-07-22 20:06:07.401398 2018-07-22 20:06:07.401398  1  8   2.288036
2018-07-22 20:06:07.406902 2018-07-22 20:06:07.406902  1  9   7.944922
2018-07-22 20:06:07.410227 2018-07-22 20:06:07.410227  2  8  14.480190
2018-07-22 20:06:07.412210 2018-07-22 20:06:07.412210  2  9   3.532110
2018-07-22 20:06:07.414637 2018-07-22 20:06:07.414637  3  8   9.019944
2018-07-22 20:06:07.416465 2018-07-22 20:06:07.416465  3  9   4.382123
2018-07-22 20:06:07.418866 2018-07-22 20:06:07.418866  4  8   2.713900
2018-07-22 20:06:07.420706 2018-07-22 20:06:07.420706  4  9  27.358240

You see a number of reserved fields for book-keeping such as

_run_id
_pset_id
_calc_dir
_time_utc

and a timestamped index. See the examples dir for more.

Tests

# apt-get install python3-nose
$ nosetests3

Concepts

The basic data structure for a param study is a list params of dicts (called “parameter sets” or short pset).

params = [{'a': 1, 'b': 'lala'},  # pset 1
          {'a': 2, 'b': 'zzz'},   # pset 2
          ...                     # ...
         ]

Each pset contains values of parameters (‘a’ and ‘b’) which are varied during the parameter study.

You need to define a callback function func, which takes exactly one pset such as:

{'a': 1, 'b': 'lala'}

and runs the workload for that pset. func must return a dict, for example:

{'result': 1.234}

or an updated pset:

{'a': 1, 'b': 'lala', 'result': 1.234}

We always merge (dict.update) the result of func with the pset, which gives you flexibility in what to return from func.

The psets form the rows of a pandas DataFrame, which we use to store the pset and the result from each run.

The idea is now to run func in a loop over all psets in params. You can do this using the ps.run helper function. The function adds some special columns such as _run_id (once per ps.run call) or _pset_id (once per pset). Using ps.run(... poolsize=...) runs func in parallel on params using multiprocessing.Pool.

This package offers some very simple helper functions which assist in creating params. Basically, we define the to-be-varied parameters (‘a’ and ‘b’) and then use something like itertools.product to loop over them to create params, which is passed to ps.run to actually perform the loop over all psets.

>>> from itertools import product
>>> from psweep import psweep as ps
>>> x=ps.seq2dicts('x', [1,2,3])
>>> y=ps.seq2dicts('y', ['xx','yy','zz'])
>>> x
[{'x': 1}, {'x': 2}, {'x': 3}]
>>> y
[{'y': 'xx'}, {'y': 'yy'}, {'y': 'zz'}]
>>> ps.loops2params(product(x,y))
[{'x': 1, 'y': 'xx'},
 {'x': 1, 'y': 'yy'},
 {'x': 1, 'y': 'zz'},
 {'x': 2, 'y': 'xx'},
 {'x': 2, 'y': 'yy'},
 {'x': 2, 'y': 'zz'},
 {'x': 3, 'y': 'xx'},
 {'x': 3, 'y': 'yy'},
 {'x': 3, 'y': 'zz'}]

The logic of the param study is entirely contained in the creation of params. E.g., if parameters shall be varied together (say x and y), then instead of

>>> product(x,y,z)

use

>>> product(zip(x,y), z)

The nestings from zip() are flattened in loops2params().

>>> z=ps.seq2dicts('z', [None, 1.2, 'X'])
>>> ps.loops2params(product(zip(x,y),z))
[{'x': 1, 'y': 'xx', 'z': None},
 {'x': 1, 'y': 'xx', 'z': 1.2},
 {'x': 1, 'y': 'xx', 'z': 'X'},
 {'x': 2, 'y': 'yy', 'z': None},
 {'x': 2, 'y': 'yy', 'z': 1.2},
 {'x': 2, 'y': 'yy', 'z': 'X'},
 {'x': 3, 'y': 'zz', 'z': None},
 {'x': 3, 'y': 'zz', 'z': 1.2},
 {'x': 3, 'y': 'zz', 'z': 'X'}]

If you want a parameter which is constant, use a list of length one:

>>> c=ps.seq2dicts('c', ['const'])
>>> ps.loops2params(product(zip(x,y),z,c))
[{'a': 1, 'c': 'const', 'y': 'xx', 'z': None},
 {'a': 1, 'c': 'const', 'y': 'xx', 'z': 1.2},
 {'a': 1, 'c': 'const', 'y': 'xx', 'z': 'X'},
 {'a': 2, 'c': 'const', 'y': 'yy', 'z': None},
 {'a': 2, 'c': 'const', 'y': 'yy', 'z': 1.2},
 {'a': 2, 'c': 'const', 'y': 'yy', 'z': 'X'},
 {'a': 3, 'c': 'const', 'y': 'zz', 'z': None},
 {'a': 3, 'c': 'const', 'y': 'zz', 'z': 1.2},
 {'a': 3, 'c': 'const', 'y': 'zz', 'z': 'X'}]

So, as you can see, the general idea is that we do all the loops before running any workload, i.e. we assemble the parameter grid to be sampled before the actual calculations. This has proven to be very practical as it helps detecting errors early.

We are aware of the fact that the data structures and functions used here are so simple that it is almost not worth a package at all, but it is helpful to have the ideas and the workflow packaged up in a central place.

Install

$ pip3 install psweep

Dev install of this repo:

$ pip3 install -e .

See also https://github.com/elcorto/samplepkg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psweep-0.2.1.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

psweep-0.2.1-py3-none-any.whl (5.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page