Skip to main content

Minimal tool for connecting your existing models in a composite model allowing for asynchronous multi-processed execution

Project description

Build Status Coverage Status Code style: black Version Python versions Docs CI/CD

hubit - a calculation hub

At a glance

Hubit is an event-driven orchestration hub for your existing calculation tools. It allows you to

  • execute calculation tools as one Hubit composite model with a loose coupling between the model components,
  • centrally configure the interfaces between calculation tools rather than coding them. This allows true separation of responsibility between different teams,
  • easily run your existing calculation tools asynchronously in multiple processes,
  • query the Hubit model for specific results thus avoiding explicitly coding (fixed) call graphs and running superfluous calculations,
  • make parameter sweeps,
  • feed previously calculated results into new calculations thus augmenting old results,
  • store results incrementally during execution and restart from previously stored results (model caching),
  • reuse results if calculations are executed multiple times with the same input (component caching),
  • visualize your Hubit composite model i.e. visualize your existing tools and the attributes that flow between them.

Motivation

Many work places have developed a rich ecosystem of stand-alone tools. These tools may be developed/maintained by different teams using different programming languages and using different input/output data models. Nevertheless, the tools often depend on results provided the other tools leading to complicated dependencies and error-prone (manual) workflows involving copy & paste. If this sounds familiar you should try Hubit.

By defining input and results data structures that are shared between your tools Hubit allows all your Python-wrappable tools to be seamlessly executed asynchronously as a single model. Asynchronous multi-processor execution often assures a better utilization of the available CPU resources compared to sequential single-processor execution. This is especially true when some time is spent in each component i.e. for CPU bound calculations. In practice this performance improvement often compensates the management overhead introduced by Hubit. Executing a fixed call graph is faster than executing the dynamically created call graph created automatically by Hubit. Nevertheless, a fixed call graph will typically always encompass all relevant calculations and provide all results, which in many cases will represent wasteful compute since only a subset of the results are actually needed. Hubit dynamically creates the smallest possible call graph that can provide the results that satisfy the user's query. Further, Hubit can visualize your existing tools and the data flow between them.

Teaser

The example below is taken from the in-depth tutorial, in the documentation.

To get results from a Hubit model requires you to submit a query, which tells Hubit what attributes from the results data structure you want to have calculated. After Hubit has processed the query, i.e. executed relevant components, the values of the queried attributes are returned in the response.

# Load model from file
hmodel = HubitModel.from_file(
  'model1.yml',
  name='car'
)

# Load the input
with open("input.yml", "r") as stream:
    input_data = yaml.load(stream, Loader=yaml.FullLoader)

# Set the input on the model object
hmodel.set_input(input_data)

# Query the model
query = ['cars[0].price']
response = hmodel.get(query)

The response looks like this

{'cars[0].price': 4280.0}

A query for parts prices for all cars looks like this

query = ['cars[:].parts[:].price']
response = hmodel.get(query)

and the corresponding response is

{
  'cars[:].parts[:].price': [
    [480.0, 1234.0, 178.0, 2343.0, 45.0],
    [312.0, 1120.0, 178.0, 3400.0]
  ]
}

From the response we can see the prices for the five parts that comprise the first car and the prices for the four parts that comprise the second car. The full example illustrates how a second calculation component can be used to calculate the total price for each car.

Hubit can render models and queries. In the example below we have rendered the query cars[0].price i.e. the price of the car at index 0 using

query = ['cars[0].price']
hmodel.render(query)

which yields the graph shown below.

The graph illustrates nodes in the input data structure, nodes in the the results data structure, the calculation components involved in creating the response as well as hints at which attributes flow in and out of these components.

Installation & requirements

Install from pypi

pip install hubit

Install from GitHub

pip install git+git://github.com/mrsonne/hubit.git

To render hubit models and queries you need to install Graphviz (https://graphviz.org/download/). On e.g. Ubuntu, Graphviz can be installed using the command

sudo apt install graphviz

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.5.0] - 2022-03-17

Added

  • Support for negative indices in query paths. The feature is illustrated in examples/car/run.py.
  • Support for negative indices in model paths. The feature is illustrated in
    • examples/tanks/run_prices.py and discussed in examples/tanks/README.md.
    • examples/wall/run_min_temperature.py and discussed in examples/wall/README.md.
  • Reduced computational overhead

Fixed

  • Explicit indexing (e.g. 1@IDX) for non-rectangular data.
  • Occasional code stall when using component caching.
  • Component caching in the case where an "upstream" result is queried before a downstream. Consider a car price calculation (downstream) that consumes the prices of all parts (upstream). The query "cars[:].price" would produce the car price as expected. The query ["cars[:].price", "cars[:].parts[:].price"] would produce the car price as expected and spawning the same number of workers as "cars[:].price", thus ignoring the superfluous query path "cars[:].parts[:].price". The query, ["cars[:].parts[:].price", "cars[:].price"] was, however, broken.
  • Image links and model excerpt example in wall example documentation.

[0.4.1] - 2021-11-06

Fixed

  • Fix broken link in README.md

[0.4.0] - 2021-11-06

Changed

  • Entrypoint functions now accept only two arguments namely _input_consumed and results_provided. Previously three arguments were expected: _input_consumed, _results_consumed and results_provided. Now _results_consumed is simply included in _input_consumed. The changes renders entrypoint functions agnostic to the origin of their input.
  • The component list in the model configuration file must now sit under a key named"components".
  • The format for cache files stored in the folder .hubit_cache has changed. To convert old cache files see the example code below. Alternatively, clear the Hubit cache using the function hubit.clear_hubit_cache().
  • Hyphen is no longer an allowed character for index identifiers. For example this model path is no longer valid segments[IDX_SEG].layers[IDX-LAY].

The example code below converts the cache file old.yml to new.yml. The file name old.yml will, more realistically, be named something like a70300027991e56db5e3b91acf8b68a5.yml.

import re
import yaml

with open("old.yml", "r") as stream:
    old_cache_data = yaml.load(stream, Loader=yaml.FullLoader)

# Replace ".DIGIT" with "[DIGIT]" in all keys (paths)
with open("new.yml", "w") as handle:
    yaml.dump(
        {
            re.sub(r"\.(\d+)", r"[\1]", path): val
            for path, val in old_cache_data.items()
        },
        handle,
    )

All files in the Hubit cache folder .hubit_cache should be converted if you want them to be compatible with Hubit 0.4.0+.

Added

  • Support for subscriptions to other domains (compartments/cells/elements). Now you can easily configure one domain to use a result from other domains as input as well as set up boundary conditions. This new feature is illustrated in the example with connected tanks in examples/tanks/README.md. To enable connected domains Hubit now allows
    • Components to share the same entrypoint function.
    • Components to be scoped using the new field component.index_scope.
    • Components to consume specific elements in lists.
    • Index offsets which enables one domain to refer to e.g. the previous domain.
  • Improved performance for cases
    • where only some branches in the input data tree are consumed, and
    • where branches are not consumed all the way to the leaves.
  • Improved model validation.
  • Improved documentation for model configuration file format.

Fixed

  • Fix broken example (examples/wall/run_precompute.py)
  • The elements of lists that are leaves in the input data tree can now be referenced and queried.
  • Lists of length 1 in the input were erroneously interpreted as a simple value.

[0.3.0] - 2021-05-07

Changed

  • The model configuration format is defined and documented in the HubitModelConfig class.
  • Introducing HubitModelConfig four configuration attributes have been renamed. Therefore, model configuration files used in Hubit 0.3- must be migrated to Hubit 0.3 format. Below is a description of the necessary migrations
    • The top-level object provides is now named provides_results.
    • The sub-objects consumes.input is now a top-level object named consumes_input.
    • The sub-objects consumes.results is now a top-level object named consumes_results.
    • The value of module_path should now be specified in the path and is interpreted as a path present in sys.path that can be imported as a dotted path. The most common use case is a package in site-packages. If path is a dotted path is_python_path should be set to True.

Added

  • Improved model configuration validation
  • Documentation

[0.2.0] - 2021-03-26

Added

  • Model-level results caching.
  • Component-level results caching.
  • Introduced logging object accessed using my_hubit_model.log().

[0.1.0] - 2021-02-28

Added

  • First release

BSD 3-Clause License

Copyright (c) 2021, Jacob Sonne. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the Hubit Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hubit-0.5.0.tar.gz (86.6 kB view hashes)

Uploaded Source

Built Distribution

hubit-0.5.0-py3-none-any.whl (96.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page