Skip to main content

concisely and clearly create large, parameterized, mapped job specifications

Project description

Parameterize Jobs

https://img.shields.io/pypi/v/parameterize_jobs.svg https://img.shields.io/travis/delgadom/parameterize_jobs.svg Documentation Status Updates

parameterize_jobs is a lightweight, pure-python toolkit for concisely and clearly creating large, parameterized, mapped job specifications.

Features

  • Expand a job’s dimensionality by multiplying ComponentSet or Constant objects

  • Extend the number of jobs by adding ComponentSet or Constant objects

  • Jobs are provided to functions as dictionaries of parameters

  • The helper decorator @expand_kwargs turns these kwarg dictionaries into named argument calls

  • Works seamlessly with many task running frameworks, including dask’s client.map and profiling tools

TODOs

View and submit issues on the [issues page](https://github.com/delgadom/parameterize_jobs/issues).

Quickstart

ComponentSet objects are the base objects, and can be defined with any number of named iterables:

>>> import parameterize_jobs as pjs

>>> a = pjs.ComponentSet(a=range(5))
>>> a
<ComponentSet {'a': 5}>

These objects have defined lengths (if the provided iterable has a defined length), and can be indexed and iterated over:

>>> len(a)
5

>>> a[0]
{'a': 0}

>>> list(a)
[{'a': 0},
 {'a': 1},
 {'a': 2},
 {'a': 3},
 {'a': 4}]

Adding two ComponentSet objects extends the total job length

>>> a2 = pjs.ComponentSet(a=range(3))

>>> a+a2
<MultiComponentSet [{'a': 5}, {'a': 3}]>

>>> len(a+a2)
8

>>> list(a+a2)

[{'a': 0},
 {'a': 1},
 {'a': 2},
 {'a': 3},
 {'a': 4},
 {'a': 0},
 {'a': 1},
 {'a': 2}]

Multiplying two ComponentSet objects expands their dimensionality:

>>> b = pjs.ComponentSet(b=range(3))

>>> a*b
<ComponentSet {'a': 5, 'b': 3}>

>>> len(a*b)
15

>>> (a*b)[-1]
{'a': 4, 'b': 2}

>>> list(a*b)
[{'a': 0, 'b': 0},
 {'a': 0, 'b': 1},
 {'a': 0, 'b': 2},
 {'a': 1, 'b': 0},
 {'a': 1, 'b': 1},
 {'a': 1, 'b': 2},
 {'a': 2, 'b': 0},
 {'a': 2, 'b': 1},
 {'a': 2, 'b': 2},
 {'a': 3, 'b': 0},
 {'a': 3, 'b': 1},
 {'a': 3, 'b': 2},
 {'a': 4, 'b': 0},
 {'a': 4, 'b': 1},
 {'a': 4, 'b': 2}]

These parameterized job specifications can be used in mappable jobs. The helper decorator expand_kwargs modifies a function to accept a dictionary and expands them into keyword arguments:

>>> @pjs.expand_kwargs
... def multiply(a, b):
...     return a * b

>>> list(map(multiply, a*b))
[0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 2, 4, 6, 8, 0, 3, 6, 9, 12]

Jobs do not have to be the combinatorial product of all components:

>>> ab1 = pjs.ComponentSet(a=[0, 1], b=[0, 1])
>>> ab2 = pjs.ComponentSet(a=[10, 11], b=[-1, 1])

>>> list(map(multiply, ab1 + ab2))
[0, 0, 0, 1, -10, -11, 10, 11]

A Constant object is simply a ComponentSet object defined with single values passed as keyword arguments rather than iterables passed as keyword arguments:

>>> c = pjs.Constant(c=5)

>>> list(map(multiply, (ab1 + ab2) * c))
[0, 0, 0, 5, -50, -55, 50, 55]

Arbitrarily complex combinations of ComponentSets can be created:

>>> c1 = pjs.Constant(c=1)
>>> c2 = pjs.Constant(c=2)

>>> list(map(multiply, (ab1 + ab2) * c1 + (ab1 + ab2) * c2))
[0, 0, 0, 1, -10, -11, 10, 11, 0, 0, 0, 2, -20, -22, 20, 22]

Anything can be inside a ComponentSet iterable, including data, functions, or other objects:

>>> transforms = (
...     pjs.Constant(transform=lambda x: x, transform_name='linear')
...     + pjs.Constant(transform=lambda x: x**2, transform_name='quadratic'))
...

>>> fps = pjs.Constant(
...     read_pattern='source/my-fun-data_{year}.csv',
...     write_pattern='transformed/my-fun-data_{transform_name}_{year}.csv')

>>> years = pjs.ComponentSet(year=range(1980, 2018))

>>> @pjs.expand_kwargs
... def process_data(read_pattern, write_pattern, transform, transform_name, year):
...
...     df = pd.read_csv(read_pattern.format(year=year))
...
...     transformed = transform(df)
...
...     transformed.to_csv(
...         write_pattern.format(
...             transform_name=transform_name,
...             year=year))
...

>>> _ = list(map(process_data, transforms * fps * years))

This works seamlessly with dask’s [client.map]() to provide intuitive job parameterization:

>>> import dask.distributed as dd
>>> client = dd.LocalClient()
>>> futures = client.map(multiply, (ab1 + ab2) * c1 + (ab1 + ab2) * c2)
>>> dd.progress(futures)

History

0.1.0 (2018-11-30)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parameterize_jobs-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

parameterize_jobs-0.1.0-py2.py3-none-any.whl (6.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file parameterize_jobs-0.1.0.tar.gz.

File metadata

  • Download URL: parameterize_jobs-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for parameterize_jobs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76967e7ab4761a092cffb30b1178746cbf72d2864fdf3c3a359d9b5e2045ec80
MD5 9049cc154c0fe45a353cf68ec462ec26
BLAKE2b-256 f82810cf08b54a398cdba0a2949d470e6bd09a89bd0a0094d2ca9cd2c602287b

See more details on using hashes here.

File details

Details for the file parameterize_jobs-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: parameterize_jobs-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for parameterize_jobs-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0f2f732e43301516e2e3036d94a38cede565f7472893e40e46f2580719ab07fd
MD5 9765da8218537b41aedda252abfa9186
BLAKE2b-256 eaee9f0165341ba23d794a418f54a13090dd94c32902839b46ef435b17cb82d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page