Hyper-parallel multi-node task execution engine
Project description
Parallelic
Parallelic is a hyperparallel multi-node task execution engine with shared data and wokspace capabilities.
Note of warning
Parallelic is not a containerization/sandboxing engine. It does not constitute a full task isolation, and provides no guarantee of such. That may change in the future, and feel free to contribute your code towards that goal, but in the mean time, keep this in consideration when giving access to a Parallelic system to third parties.
Installation
From git
- Clone the git repo locally.
- Download python3(.7) and corresponding pip
- Install Poetry
- Run
poetry install
to create a virtualenv and install dependencies
At this point, you can use parallelic through
poetry run python -m parallelic
- Run
poetry build
to build a wheel - Run
pip install dist/parallelic-[version]-py3-none-any.whl
Now you can use parallelic without poetry:
python -m parallelic
From pip
- Run
pip install parallelic
Usage
Running a task
To run an already defined task, you upload the task package (a zipped up task root directory) via the Parallelic WebUI, or Parallelic CLI client, to the Parallelic manager instance.
You may need to provide access credentials before being allowed to upload the task package, as per your Parallelic system configuration.
From there, the Parallelic manager instance will take care of everything else.
Defining a task
Task definitions follow a particullar directory tree.
The task root contains a task.yml
file, that contains metadata required for the manager to set up and prepare resources for the compute nodes in order to run the particullar task.
The source
directory gets distributed to all compute nodes, it should contain only the source files required to run the task.
This directory is set up with Read+eXecute permissions. Every task has to contain this directory.
The data
directory gets distributed to all compute nodes, if "shared_data"
is requested in the task's configuration. It is set up with Read+Write permissions.
It is suitable for datasets that need to be shared across all nodes, but whose modifications by the nodes shouldn't propagate to other nodes. A task should contain this directory if "shared_data"
is requested.
It can be accessd by the task in the task's working directory.
The workspace
directory gets created automatically by the manager, if "shared_workspace"
is requested in the task's configuration.
It is set up with Read+Write permissions, but task.yml
can request the manager to create a scripts
subdirectory, with Read+Write+eXecute permissions.
It should never be in the task's root directory, and any content from the task's root directory won't get sent to the compute nodes. Compute nodes should populate this directory themselves, at runtime. Any changes to this directory will propagate across all compute nodes, and it can be access in the task's working directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for parallelic-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7f0c5e6160e537711f059a12ba7987ce7e08e8c14dafb7f24e8029ca3d7c0fd |
|
MD5 | 729c43339627b12b3880f372fff669e4 |
|
BLAKE2b-256 | 15e9e4ff1400a6bd5f4f39a43ce63a394bb559d841112a54e26eb6f96fe914a4 |