Skip to main content

Function dependencies resolution and execution

Project description

https://circleci.com/gh/cedricleroy/pyungo.svg?style=shield

pyungo

pyungo is a lightweight library to link a set of dependent functions together, and execute them in an ordered manner.

pyungo is built around Graphs and Nodes used in a DAG (Directed Acyclic Graph). A Node represent a function being run with a defined set of inputs and returning one or several outputs. A Graph is a collection of Nodes where data can flow in an logical manner, the output of one node serving as input of another.

installation

>> pip install pyungo

simple example

graph = Graph()

@graph.register(inputs=['d', 'a'], outputs=['e'])
def f_my_function_2(d, a):
    return d - a

@graph.register(inputs=['c'], outputs=['d'])
def f_my_function_1(c):
    return c / 10.

@graph.register(inputs=['a', 'b'], outputs=['c'])
def f_my_function_3(a, b):
    return a + b

res = graph.calculate(data={'a': 2, 'b': 3})
print(res)

pyungo is registering the functions at import time. It then resolve the DAG and figure out the sequence at which the functions have to be run per their inputs / outputs. In this case, it will be function 3 then 1 and finally 2.

The ordered Graph is run with calculate, with the given data. It returns the output of the last function being run (e), but all intermediate results are also available in the graph instance.

The result will be (a + b) / 10 - a = -1.5

parallelism

When resolving the dag, pyungo figure out nodes that can be run in parallel. When creating a graph, we can specify the option parallel=True for running calculations concurently when possible, using Python multiprocessing module. We can specify the pool size when instantiating the Graph. This will set the maximum number of processes that will be launched. If 3 nodes can run in parallel and just 2 processes are used, pyungo will run calculation on the first 2 nodes first and will run the last one as soon as a process will be free.

Instantiating a Graph with a pool of 5 processes for running calculations in parralel:

graph = Graph(parallel=True, pool_size=5)

Note: Running functions in parallel has a cost. Python will spend time creating / deleting new processes. Parallelism is recommended when at least 2 concurrent nodes have heavy calculations which takes a significant amount of time.

sanity check

pyungo will raise an error in the following situations:

  1. Circular dependencies: The Graph need to be finite and cannot form a loop.

  2. All inputs needed to run a graph are not provided.

  3. Input collision: An input name provided as data in the graph has a conflict with at least of the output name.

  4. Duplicated outputs: Several nodes are giving output(s) that have the same name.

testing

>> pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyungo-0.5.0.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

pyungo-0.5.0-py2.py3-none-any.whl (7.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page