Minimal tool for connecting your existing models in a composite model allowing for asynchronous multi-processed execution
Project description
hubit - a calculation hub
At a glance
Hubit
is an event-driven orchestration hub for your existing calculation tools. It allows you to
- execute calculation tools as one
Hubit
composite model with a loose coupling between the model components, - centrally configure the interfaces between calculation tools rather than coding them. This allows true separation of responsibility between different teams,
- easily run your existing calculation tools asynchronously in multiple processes,
- query the
Hubit
model for specific results thus avoiding explicitly coding (fixed) call graphs and running superfluous calculations, - make parameter sweeps,
- feed previously calculated results into new calculations thus augmenting old results,
- store results incrementally during execution and restart from previously stored results (model caching),
- reuse results if calculations are executed multiple times with the same input (component caching),
- visualize your
Hubit
composite model i.e. visualize your existing tools and the attributes that flow between them.
Motivation
Many work places have developed a rich ecosystem of stand-alone tools. These tools may be developed/maintained by different teams using different programming languages and using different input/output data models. Nevertheless, the tools often depend on results provided the other tools leading to complicated dependencies and error-prone (manual) workflows involving copy & paste. If this sounds familiar you should try Hubit
.
By defining input and results data structures that are shared between your tools Hubit
allows all your Python-wrappable tools to be seamlessly executed asynchronously as a single model. Asynchronous multi-processor execution often assures a better utilization of the available CPU resources compared to sequential single-processor execution. This is especially true when some time is spent in each component i.e. for CPU bound calculations. In practice this performance improvement often compensates the management overhead introduced by Hubit
.
Executing a fixed call graph is faster than executing the dynamically created call graph created automatically by Hubit
. Nevertheless, a fixed call graph will typically always encompass all relevant calculations and provide all results, which in many cases will represent wasteful compute since only a subset of the results are actually needed. Hubit
dynamically creates the smallest possible call graph that can provide the results that satisfy the user's query. Further, Hubit
can visualize your existing tools and the data flow between them.
Teaser
The example below is taken from the in-depth tutorial, in the documentation.
To get results from a Hubit
model requires you to submit a query, which tells Hubit
what attributes from the results data structure you want to have calculated. After Hubit
has processed the query, i.e. executed relevant components, the values of the queried attributes are returned in the response.
# Load model from file
hmodel = HubitModel.from_file(
'model1.yml',
name='car'
)
# Load the input
with open("input.yml", "r") as stream:
input_data = yaml.load(stream, Loader=yaml.FullLoader)
# Set the input on the model object
hmodel.set_input(input_data)
# Query the model
query = ['cars[0].price']
response = hmodel.get(query)
The response looks like this
{'cars[0].price': 4280.0}
A query for parts prices for all cars looks like this
query = ['cars[:].parts[:].price']
response = hmodel.get(query)
and the corresponding response is
{
'cars[:].parts[:].price': [
[480.0, 1234.0, 178.0, 2343.0, 45.0],
[312.0, 1120.0, 178.0, 3400.0]
]
}
From the response we can see the prices for the five parts that comprise the first car and the prices for the four parts that comprise the second car. The full example illustrates how a second calculation component can be used to calculate the total price for each car.
Hubit
can render models and queries. In the example below we have rendered the query cars[0].price
i.e. the price of the car at index 0 using
query = ['cars[0].price']
hmodel.render(query)
which yields the graph shown below.
The graph illustrates nodes in the input data structure, nodes in the the results data structure, the calculation components involved in creating the response as well as hints at which attributes flow in and out of these components.
Installation & requirements
Install from pypi
pip install hubit
Install from GitHub
pip install git+git://github.com/mrsonne/hubit.git
To render hubit
models and queries you need to install Graphviz (https://graphviz.org/download/). On e.g. Ubuntu, Graphviz can be installed using the command
sudo apt install graphviz
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.5.0] - 2022-03-17
Added
- Support for negative indices in query paths. The feature is illustrated in
examples/car/run.py
. - Support for negative indices in model paths. The feature is illustrated in
examples/tanks/run_prices.py
and discussed inexamples/tanks/README.md
.examples/wall/run_min_temperature.py
and discussed inexamples/wall/README.md
.
- Reduced computational overhead
Fixed
- Explicit indexing (e.g. 1@IDX) for non-rectangular data.
- Occasional code stall when using component caching.
- Component caching in the case where an "upstream" result is queried
before a downstream. Consider a car price calculation (downstream) that consumes the prices of
all parts (upstream). The query
"cars[:].price"
would produce the car price as expected. The query["cars[:].price", "cars[:].parts[:].price"]
would produce the car price as expected and spawning the same number of workers as"cars[:].price"
, thus ignoring the superfluous query path"cars[:].parts[:].price"
. The query,["cars[:].parts[:].price", "cars[:].price"]
was, however, broken. - Image links and model excerpt example in wall example documentation.
[0.4.1] - 2021-11-06
Fixed
- Fix broken link in README.md
[0.4.0] - 2021-11-06
Changed
- Entrypoint functions now accept only two arguments namely
_input_consumed
andresults_provided
. Previously three arguments were expected:_input_consumed
,_results_consumed
andresults_provided
. Now_results_consumed
is simply included in_input_consumed
. The changes renders entrypoint functions agnostic to the origin of their input. - The component list in the model configuration file must now sit under a key named"components".
- The format for cache files stored in the folder
.hubit_cache
has changed. To convert old cache files see the example code below. Alternatively, clear theHubit
cache using the functionhubit.clear_hubit_cache()
. - Hyphen is no longer an allowed character for index identifiers. For example this model path is no longer valid
segments[IDX_SEG].layers[IDX-LAY]
.
The example code below converts the cache file old.yml
to new.yml
. The file name old.yml
will, more realistically, be named something like a70300027991e56db5e3b91acf8b68a5.yml
.
import re
import yaml
with open("old.yml", "r") as stream:
old_cache_data = yaml.load(stream, Loader=yaml.FullLoader)
# Replace ".DIGIT" with "[DIGIT]" in all keys (paths)
with open("new.yml", "w") as handle:
yaml.dump(
{
re.sub(r"\.(\d+)", r"[\1]", path): val
for path, val in old_cache_data.items()
},
handle,
)
All files in the Hubit cache folder .hubit_cache
should be converted if you want them to be compatible with Hubit
0.4.0+.
Added
- Support for subscriptions to other domains (compartments/cells/elements). Now you can easily configure one domain to use a result from other domains as input as well as set up boundary conditions. This new feature is illustrated in the example with connected tanks in
examples/tanks/README.md
. To enable connected domains Hubit now allows- Components to share the same entrypoint function.
- Components to be scoped using the new field
component.index_scope
. - Components to consume specific elements in lists.
- Index offsets which enables one domain to refer to e.g. the previous domain.
- Improved performance for cases
- where only some branches in the input data tree are consumed, and
- where branches are not consumed all the way to the leaves.
- Improved model validation.
- Improved documentation for model configuration file format.
Fixed
- Fix broken example (
examples/wall/run_precompute.py
) - The elements of lists that are leaves in the input data tree can now be referenced and queried.
- Lists of length 1 in the input were erroneously interpreted as a simple value.
[0.3.0] - 2021-05-07
Changed
- The model configuration format is defined and documented in the
HubitModelConfig
class. - Introducing
HubitModelConfig
four configuration attributes have been renamed. Therefore, model configuration files used in Hubit 0.3- must be migrated to Hubit 0.3 format. Below is a description of the necessary migrations- The top-level object
provides
is now namedprovides_results
. - The sub-objects
consumes.input
is now a top-level object namedconsumes_input
. - The sub-objects
consumes.results
is now a top-level object namedconsumes_results
. - The value of
module_path
should now be specified in thepath
and is interpreted as a path present insys.path
that can be imported as a dotted path. The most common use case is a package insite-packages
. Ifpath
is a dotted pathis_python_path
should be set toTrue
.
- The top-level object
Added
- Improved model configuration validation
- Documentation
[0.2.0] - 2021-03-26
Added
- Model-level results caching.
- Component-level results caching.
- Introduced logging object accessed using
my_hubit_model.log()
.
[0.1.0] - 2021-02-28
Added
- First release
BSD 3-Clause License
Copyright (c) 2021, Jacob Sonne. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
Neither the name of the Hubit Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hubit-0.5.0.tar.gz
.
File metadata
- Download URL: hubit-0.5.0.tar.gz
- Upload date:
- Size: 86.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a6adce691f5084f9474f0b1306b8a6e9c33ddfc973888e4d99cc85cef566dc3 |
|
MD5 | aa81a22804c59e2c00720c746f5b97bc |
|
BLAKE2b-256 | f3c1c03bc4089f980ef60ed3cecd1317f72534f8b451c3d4c5fe876dd9645bbc |
File details
Details for the file hubit-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: hubit-0.5.0-py3-none-any.whl
- Upload date:
- Size: 96.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e61c1a163600224a349892f43a2cd260aa2ea9f464f75df51a2f27b2bb9c5b3c |
|
MD5 | e524a3b948b0400400751bbf4839e293 |
|
BLAKE2b-256 | 801a7c0256e2547a1fbc52d9f58ee1d562d4c93c1c057a971a58a95ce6856a13 |