Function dependencies resolution and execution

pyungo

pyungo is a lightweight library to link a set of dependent functions together, and execute them in an ordered manner.

pyungo is built around Graphs and Nodes used in a DAG (Directed Acyclic Graph). A Node represent a function being run with a defined set of inputs and returning one or several outputs. A Graph is a collection of Nodes where data can flow in an logical manner, the output of one node serving as input of another.

installation

>> pip install pyungo

simple example

graph = Graph()

@graph.register(inputs=['d', 'a'], outputs=['e'])
def f_my_function_2(d, a):
return d - a

@graph.register(inputs=['c'], outputs=['d'])
def f_my_function_1(c):
return c / 10.

@graph.register(inputs=['a', 'b'], outputs=['c'])
def f_my_function_3(a, b):
return a + b

res = graph.calculate(data={'a': 2, 'b': 3})
print(res)

pyungo is registering the functions at import time. It then resolve the DAG and figure out the sequence at which the functions have to be run per their inputs / outputs. In this case, it will be function 3 then 1 and finally 2.

The ordered Graph is run with calculate, with the given data. It returns the output of the last function being run (e), but all intermediate results are also available in the graph instance.

The result will be (a + b) / 10 - a = -1.5

parallelism

When resolving the dag, pyungo figure out nodes that can be run in parallel. When creating a graph, we can specify the option parallel=True for running calculations concurently when possible, using multiprocess module. This package is not automatically installed with pyungo, and will need to be installed manually if parallelism is used. We can specify the pool size when instantiating the Graph. This will set the maximum number of processes that will be launched. If 3 nodes can run in parallel and just 2 processes are used, pyungo will run calculation on the first 2 nodes first and will run the last one as soon as a process will be free.

Instantiating a Graph with a pool of 5 processes for running calculations in parralel:

graph = Graph(parallel=True, pool_size=5)

Note: Running functions in parallel has a cost. Python will spend time creating / deleting new processes. Parallelism is recommended when at least 2 concurrent nodes have heavy calculations which takes a significant amount of time.

sanity check

pyungo will raise an error in the following situations:

1. Circular dependencies: The Graph need to be finite and cannot form a loop.

2. All inputs needed to run a graph are not provided.

3. Input collision: An input name provided as data in the graph has a conflict with at least of the output name.

4. Duplicated outputs: Several nodes are giving output(s) that have the same name.

testing

>> pytest

Project details

Source Distribution

pyungo-0.6.tar.gz (5.1 kB view hashes)

Uploaded source

Built Distribution

pyungo-0.6-py2.py3-none-any.whl (7.4 kB view hashes)

Uploaded py2 py3