Skip to main content

A toolkit for doing parameter surveys

Project description

paramsurvey

Build Status Coverage Status Apache License 2.0

paramsurvey is a set of tools for creating and executing parameter surveys.

paramsurvey has a pluggable parallel backend. The supported backends at present are python's multiprocessing module, and computing cluster software ray. An mpi backend is planned.

Example

import time
import paramsurvey


def sleep_worker(pset, system_kwargs, user_kwargs, raw_stats):
    time.sleep(pset['duration'])
    return {'slept': pset['duration']}


paramsurvey.init(backend='multiprocessing')  # or 'ray', if you installed it

psets = [{'duration': 0.3}] * 5

results = paramsurvey.map(sleep_worker, psets, verbose=2)

for r in results:
    print(r.duration, r.slept)

prints, in addition to some debugging output, a result from each of the 5 sleep_worker calls.

Here are a few more examples:

Philosophy

A parameter survey runs begins by initializing the software, specifying a backend ('multiprocessing' or 'ray').

The user supplies a worker function, which takes a dict of parameters (pset) and returns a dict of results.

The user also supplies a list of parameter sets (psets), perhaps constructed using the helper function paramsurvey.params.product().

Calling pararamsurvey.map() executes the worker function once for each pset. It returns a MapResults object, containing the results, performance statistics, and information about any failures.

You can call paramsurvey.map() more than once.

Controlling verbosity

The paramsurvey code has a set of keyword arguments (and corresponding environment variables) to aid debugging and testing. They are:

  • backend="multiprocessing" -- which backend to use, currently "multiprocessing" (default) or "ray"
  • verbose=1 -- print information about the progress of the computation:
    • 0 = print nothing
    • 1 = print something every 30 seconds (default)
    • 2 = print something every second
    • 3 = print something for every action
  • vstats=1 -- controls the verbosity of the performance statistics system, with similar values as verbose
  • limit=-1 -- limits the number of psets actually computed to this number (-1 meaning "all")

Each of these has a corresponding environment variable, e.g. PARAMSURVEY_BACKEND, PARAMSURVEY_VERBOSE. If the environment variable is set, it overrides the values set in the source code. If a kwarg is set for a map() call, that value overrides any value specified for the init() call.

For retrospective debugging, i.e. your run crashes and you are sad that you specified a lower verbosity than you desire post-crash, paramsurvey creates a hidden logfile in the current directory for every run, named .paramusurvey-DATE-TIME.log.

The MapResults object

The MapResults object has several properties:

  • results is a Pandas DataFrame containing the values of the pset and the keys returned by the worker function. If you prefer to deal with dictionaries and are not worried about memory usage, results.to_listdict returns a list of dictionaries.
  • failed is a list of failed psets dictionaries, plus an extra '_exception' key if an exception was raised in the worker function.
  • progress is a MapProgress object with properties containing the details of pset execution: total, active, finished, failures, exceptions.
  • stats is a PerfStats object containing performance statistics.

Worker function limitations

The worker function runs in a different address space and possibly on a different server. It shouldn't access any global variables.

For hard-to-explain Python reasons, define the worker function before calling paramsurvey.init(). The worker function should not be nested inside another function. On Windows, the main program file should have a if __name == '__main__' guard similar to the examples at the top of the Python multprocessing documentation.

Installing

$ pip install paramsurvey
$ pip install paramsurvey[ray]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paramsurvey-0.4.6.tar.gz (34.2 kB view details)

Uploaded Source

File details

Details for the file paramsurvey-0.4.6.tar.gz.

File metadata

  • Download URL: paramsurvey-0.4.6.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.4

File hashes

Hashes for paramsurvey-0.4.6.tar.gz
Algorithm Hash digest
SHA256 a802463948883433185d4a6f881d939d8717e536a032d090e37a4776fabd9010
MD5 1d8ac88aa71b07e925d9e1e4011cd841
BLAKE2b-256 6909a3a6b8a09af0c0d9c0db3eb349293928633166b385ed0fd558b7afda51e7

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page