A toolkit for doing parameter surveys
Project description
paramsurvey
paramsurvey is a set of tools for creating and executing parameter surveys.
paramsurvey has a pluggable parallel backend. The supported backends at present
are python's multiprocessing module, and computing cluster software ray
. An mpi
backend is planned.
Example
import time
import paramsurvey
def sleep_worker(pset, system_kwargs, user_kwargs):
time.sleep(pset['duration'])
return {'slept': pset['duration']}
paramsurvey.init(backend='multiprocessing') # or 'ray', if you installed it
psets = [{'duration': 0.3}] * 5
results = paramsurvey.map(sleep_worker, psets, verbose=2)
for r in results.itertuples():
print(r.duration, r.slept)
for r in results.iterdicts():
print(r['duration'], r['slept'])
prints, in addition to some debugging output, a result from each of the 5 sleep_worker calls.
Here are a few more examples:
- The above example, with a few notes
- An example of a multi-stage computation, running several
map()
functions in a row - An example of greedy optimization, selecting the best alternative from each
map()
result - An example that runs a command-line program for each pset
These examples are installed with the package, so you can run them like this:
$ paramsurvey-readme-example.py
$ paramsurvey-multistage-example.py
$ paramsurvey-greedy-example.py
Philosophy
A parameter survey runs begins by initializing the software, specifying a backend ('multiprocessing' or 'ray').
The user supplies a worker function, which takes a dict of parameters (pset) and returns a dict of results.
The user also supplies a list of parameter sets (psets), perhaps
constructed using the helper function paramsurvey.params.product()
.
Calling pararamsurvey.map()
executes the worker function once for
each pset. It returns a MapResults
object, containing the results,
performance statistics, and information about any failures.
You can call paramsurvey.map()
more than once.
Keyword arguments to init() and map()
The paramsurvey
code has a set of keyword arguments (and corresponding environment
variables) to aid debugging and testing. They are:
backend="multiprocessing"
-- which backend to use, currently "multiprocessing" (default) or "ray"verbose=1
-- print information about the progress of the computation:- 0 = print nothing
- 1 = print something every 30 seconds (default)
- 2 = print something every second
- 3 = print something for every action
vstats=1
-- controls the verbosity of the performance statistics system, with similar values asverbose
limit=0
-- limits the number of psets actually computed to this number (0 meaning "all")ncores=-1
-- limits the number of cores used, in this case 1 less than the number available (multiprocessing only)max_tasks_per_child=3
-- the number of tasks a child will do before restarting. Useful to limit memory leaks. Default: infinite
Each of these has a corresponding environment variable,
e.g. PARAMSURVEY_BACKEND
, PARAMSURVEY_VERBOSE
. If the environment
variable is set, it overrides the values set in the source code. If a
kwarg is set for a map()
call, that value overrides any value
specified for the init()
call.
For example, if you wish to debug a large computation by running a small subset of it on a single node, the environment variables allow you to do this without editing your source code:
$ PARAMSURVEY_BACKEND=multiprocessing PARAMSURVEY_VERBOSE=3 PARAMSURVEY_LIMIT=10 ./myprogram.py
For retrospective debugging, i.e. your run crashes and you are sad
that you specified a lower verbosity than you desire post-crash,
paramsurvey
creates a hidden logfile in the current directory for
every run, named .paramusurvey-DATE-TIME.log
.
Backend-specific arguments
Both init()
and map()
take a backend-specific keyword argument named for the backend, and
ignored by other backends. For example, to pass an argument only used by the ray
backend,
paramsurvey.map(..., ray={'num_gpus': 1})
The MapResults object
The MapResults object has several properties:
results
is a Pandas DataFrame containing the values of the pset and the keys returned by the worker function. Iterating over these results is documented above, as either dicts or tuples.missing
is a DataFrame of psets that did not generate results, plus extra '_exception' and '_traceback' columns if an exception was raised in the worker function.progress
is a MapProgress object with properties containing the details of pset execution: total, active, finished, failures, exceptions.stats
is a PerfStats object containing performance statistics.
Worker function limitations
The worker function runs in a different address space and possibly on a different server. It shouldn't access any global variables.
For hard-to-explain Python reasons, define the worker function before
calling paramsurvey.init()
. The worker function should not be nested
inside another function. On Windows, the main program file should have
a if __name == '__main__'
guard similar to the examples at the top
of the Python multprocessing
documentation.
Installing
$ pip install paramsurvey
$ pip install paramsurvey[ray]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file paramsurvey-0.4.13.tar.gz
.
File metadata
- Download URL: paramsurvey-0.4.13.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50c6ed7f827e18d458ea14a8a7693cd03e8ef8c3f8fd36cdf3dcb9443639b7d6 |
|
MD5 | c993f695bd08d1e6fc1f3dc3b25eedf3 |
|
BLAKE2b-256 | 14fd60ddea315a7bbf3a07acd5fc9a24f66953215803dd7c1026c78bba2c8e2c |