NoETL: A Python package for managing workflows
Project description
NoETL Workflow Execution System Documentation
NoETL ("Not Only ETL") is a workflow automation library and framework to simplify the process of defining, managing, and executing complex workflows. Particularly well-suited for orchestrating data processing pipelines, it extends beyond just ETL tasks and is designed for task automation in distributed runtime environments.
Introduction
This repository contains the noetl
library which is available on pip here.
The actual runtime applications are located in other repositories:
The NoETL system is a workflow execution engine designed to automate the execution of tasks defined in a playbook or just deployed as services. It employs a publisher-subscriber pattern for command
transmission and event
reception using the NATS messaging system. Inspired by Erlang's architecture, NoETL leverages a plugin-based approach, enabling a scalable, resilient, and efficient execution environment.
Architecture Overview
At its core, NoETL is built around a publisher-subscriber model that utilizes the NATS messaging system. This system is designed for the automated execution of tasks as defined by a specific playbook, incorporating an Erlang-inspired, plugin-based architecture.
Plugin-Based Workflow and Erlang Design Principles
NoETL's architecture draws heavily from several key concepts of Erlang:
- Everything as a Plugin: All functional units in NoETL are treated as plugins, similar to Erlang's process encapsulation. These plugins are Docker images that can be executed as services or jobs within a Kubernetes environment. In playbooks, the term 'plugin' refers to these Docker images.
- Strong Isolation: Each plugin operates independently and in isolation, akin to Erlang's process isolation.
- Lightweight Plugin Management: Dynamic and efficient creation and destruction of plugin instances are central to NoETL, enabling a scalable architecture.
- Message Passing Interaction: Plugins communicate through message passing via NATS streams, ensuring targeted and accurate messaging.
- No Shared Resources: Plugins operate without shared resources, fostering isolated execution and reduced contention.
- Resilience and Reliability: Each plugin is designed to perform effectively or fail gracefully, ensuring the robustness of the system.
Workflow
In NoETL, workflows are defined as playbooks – YAML scripts that orchestrate the execution of tasks in a predefined sequence. Each playbook describes a series of steps within tasks, where each step corresponds to a specific plugin.
Tasks and Steps
Tasks are the primary operational elements within a playbook, consisting of multiple sequential steps:
- Parallel Task Execution: Tasks can be executed concurrently, enhancing playbook efficiency.
- Sequential Step Execution: Steps within a task are executed in order, with each step needing to complete before the next begins.
- Atomic Operation - Step: The most atomic operation in NoETL is the 'step,' referring to the execution of a plugin module.
Events and Commands
Events and commands drive the operation of NoETL:
- Commands: Trigger the execution of a step in the task.
- Events: Published upon step and task completion, signaling its end.
This model maintains a decoupled and fault-tolerant playbook execution.
NATS Communication Subjects
Subjects in NoETL provide contextual information:
- Command Subjects:
command.<plugin_name>.<workflow_instance_id>
- Event Subjects:
event.<plugin_name>.<workflow_instance_id>
N.B. Error handling is a part of event subjects.
Mini-Plugin Architecture
NoETL includes several core service plugins:
- NoETL GraphQL API Plugin: Provides an interface for querying and interacting with NoETL using GraphQL.
- Dispatcher Plugin: Responsible for dispatch actions, steps' output, task queue management, and creating commands for other plugins to be executed.
- Registrar Plugin: Manages playbook, plugin, and command reception and registration.
Plugins communicate using NATS messaging, driven by YAML playbooks specifying task sequences. The Kubernetes environment serves as the execution platform.
Prerequisites
- Python 3.11 or later
- pip
- Docker Desktop's Kubernetes cluster, for local development
Installation
To install NoETL use pip
:
pip install noetl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file noetl-0.1.17.tar.gz
.
File metadata
- Download URL: noetl-0.1.17.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b6cc7ef24b7d8edf451cd8935dfd7567d907b714a38583a00345d23661a2ada |
|
MD5 | 122c4e5b436fdc656386a356f266454b |
|
BLAKE2b-256 | 3a01e98f0b59a2a1bbc1da18a9f9df10b2669533cea9ada24a741fa3673b05c2 |
File details
Details for the file noetl-0.1.17-py3-none-any.whl
.
File metadata
- Download URL: noetl-0.1.17-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbf5e6eef19e87518aac011bba5b33ebad03c4ebde950c9fb6e293eb7718c9b9 |
|
MD5 | 9f0fb5a18d59e9720fffe4d9fcd8106d |
|
BLAKE2b-256 | c6d9622bf6bb094bbb693697ac55e36e03346af77320c8957432b6513a6a64cc |