concisely and clearly create large, parameterized, mapped job specifications
Project description
Parameterize Jobs
parameterize_jobs is a lightweight, pure-python toolkit for concisely and clearly creating large, parameterized, mapped job specifications.
Free software: MIT license
Documentation: https://parameterize-jobs.readthedocs.io
Features
Expand a job’s dimensionality by multiplying ComponentSet or Constant objects
Extend the number of jobs by adding ComponentSet or Constant objects
Jobs are provided to functions as dictionaries of parameters
The helper decorator @expand_kwargs turns these kwarg dictionaries into named argument calls
Works seamlessly with many task running frameworks, including dask’s client.map and profiling tools
TODOs
View and submit issues on the [issues page](https://github.com/delgadom/parameterize_jobs/issues).
Quickstart
ComponentSet objects are the base objects, and can be defined with any number of named iterables:
>>> import parameterize_jobs as pjs
>>> a = pjs.ComponentSet(a=range(5))
>>> a
<ComponentSet {'a': 5}>
These objects have defined lengths (if the provided iterable has a defined length), and can be indexed and iterated over:
>>> len(a)
5
>>> a[0]
{'a': 0}
>>> list(a)
[{'a': 0},
{'a': 1},
{'a': 2},
{'a': 3},
{'a': 4}]
Adding two ComponentSet objects extends the total job length
>>> a2 = pjs.ComponentSet(a=range(3))
>>> a+a2
<MultiComponentSet [{'a': 5}, {'a': 3}]>
>>> len(a+a2)
8
>>> list(a+a2)
[{'a': 0},
{'a': 1},
{'a': 2},
{'a': 3},
{'a': 4},
{'a': 0},
{'a': 1},
{'a': 2}]
Multiplying two ComponentSet objects expands their dimensionality:
>>> b = pjs.ComponentSet(b=range(3))
>>> a*b
<ComponentSet {'a': 5, 'b': 3}>
>>> len(a*b)
15
>>> (a*b)[-1]
{'a': 4, 'b': 2}
>>> list(a*b)
[{'a': 0, 'b': 0},
{'a': 0, 'b': 1},
{'a': 0, 'b': 2},
{'a': 1, 'b': 0},
{'a': 1, 'b': 1},
{'a': 1, 'b': 2},
{'a': 2, 'b': 0},
{'a': 2, 'b': 1},
{'a': 2, 'b': 2},
{'a': 3, 'b': 0},
{'a': 3, 'b': 1},
{'a': 3, 'b': 2},
{'a': 4, 'b': 0},
{'a': 4, 'b': 1},
{'a': 4, 'b': 2}]
These parameterized job specifications can be used in mappable jobs. The helper decorator expand_kwargs modifies a function to accept a dictionary and expands them into keyword arguments:
>>> @pjs.expand_kwargs
... def multiply(a, b):
... return a * b
>>> list(map(multiply, a*b))
[0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 2, 4, 6, 8, 0, 3, 6, 9, 12]
Jobs do not have to be the combinatorial product of all components:
>>> ab1 = pjs.ComponentSet(a=[0, 1], b=[0, 1])
>>> ab2 = pjs.ComponentSet(a=[10, 11], b=[-1, 1])
>>> list(map(multiply, ab1 + ab2))
[0, 0, 0, 1, -10, -11, 10, 11]
A Constant object is simply a ComponentSet object defined with single values passed as keyword arguments rather than iterables passed as keyword arguments:
>>> c = pjs.Constant(c=5)
>>> list(map(multiply, (ab1 + ab2) * c))
[0, 0, 0, 5, -50, -55, 50, 55]
Arbitrarily complex combinations of ComponentSets can be created:
>>> c1 = pjs.Constant(c=1)
>>> c2 = pjs.Constant(c=2)
>>> list(map(multiply, (ab1 + ab2) * c1 + (ab1 + ab2) * c2))
[0, 0, 0, 1, -10, -11, 10, 11, 0, 0, 0, 2, -20, -22, 20, 22]
Anything can be inside a ComponentSet iterable, including data, functions, or other objects:
>>> transforms = (
... pjs.Constant(transform=lambda x: x, transform_name='linear')
... + pjs.Constant(transform=lambda x: x**2, transform_name='quadratic'))
...
>>> fps = pjs.Constant(
... read_pattern='source/my-fun-data_{year}.csv',
... write_pattern='transformed/my-fun-data_{transform_name}_{year}.csv')
>>> years = pjs.ComponentSet(year=range(1980, 2018))
>>> @pjs.expand_kwargs
... def process_data(read_pattern, write_pattern, transform, transform_name, year):
...
... df = pd.read_csv(read_pattern.format(year=year))
...
... transformed = transform(df)
...
... transformed.to_csv(
... write_pattern.format(
... transform_name=transform_name,
... year=year))
...
>>> _ = list(map(process_data, transforms * fps * years))
This works seamlessly with dask’s [client.map]() to provide intuitive job parameterization:
>>> import dask.distributed as dd
>>> client = dd.LocalClient()
>>> futures = client.map(multiply, (ab1 + ab2) * c1 + (ab1 + ab2) * c2)
>>> dd.progress(futures)
History
0.1.0 (2018-11-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parameterize_jobs-0.1.0.tar.gz
.
File metadata
- Download URL: parameterize_jobs-0.1.0.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76967e7ab4761a092cffb30b1178746cbf72d2864fdf3c3a359d9b5e2045ec80 |
|
MD5 | 9049cc154c0fe45a353cf68ec462ec26 |
|
BLAKE2b-256 | f82810cf08b54a398cdba0a2949d470e6bd09a89bd0a0094d2ca9cd2c602287b |
File details
Details for the file parameterize_jobs-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: parameterize_jobs-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f2f732e43301516e2e3036d94a38cede565f7472893e40e46f2580719ab07fd |
|
MD5 | 9765da8218537b41aedda252abfa9186 |
|
BLAKE2b-256 | eaee9f0165341ba23d794a418f54a13090dd94c32902839b46ef435b17cb82d8 |