Skip to main content

NoETL: A Python package for managing workflows

Project description

NoETL Workflow Execution System Documentation

NoETL is a workflow automation framework designed to simplify the process of defining, managing, and executing complex workflows. It is especially well-suited for orchestrating data processing pipelines and task automation.

Introduction

This repository contains the noetl library which is available on pip here.

The actual runtime applications are located in other repositories:

The NoETL system is a workflow execution engine designed to automate the execution of tasks defined in a playbook or just deployed as a services. It employs a publisher-subscriber pattern for command transmission and event reception using the NATS messaging system. Inspired by Erlang's architecture, NoETL leverages a plugin-based approach, enabling a scalable, resilient, and efficient execution environment.

Architecture Overview

At its core, NoETL is built around a publisher-subscriber model that utilizes the NATS messaging system. This system is designed for the automated execution of tasks as defined by a specific playbook, incorporating an Erlang-inspired, plugin-based architecture.

Plugin-Based Workflow and Erlang Design Principles

NoETL's architecture draws heavily from several key concepts of Erlang:

  • Everything as a Plugin: All functional units in NoETL are treated as plugins, similar to Erlang's process encapsulation. These plugins are Docker images that can be executed as services or jobs within a Kubernetes environment. In playbooks, the term 'plugin' refers to these Docker images.
  • Strong Isolation: Each plugin operates independently and in isolation, akin to Erlang's process isolation.
  • Lightweight Plugin Management: Dynamic and efficient creation and destruction of plugin instances are central to NoETL, enabling a scalable architecture.
  • Message Passing Interaction: Plugins communicate through message passing via NATS streams, ensuring targeted and accurate messaging.
  • No Shared Resources: Plugins operate without shared resources, fostering isolated execution and reduced contention.
  • Resilience and Reliability: Each plugin is designed to perform effectively or fail gracefully, ensuring the robustness of the system.

Workflow

In NoETL, workflows are defined as playbooks – YAML scripts that orchestrate the execution of tasks in a predefined sequence. Each playbook describes a series of steps within tasks, where each step corresponding to a specific plugin.

Tasks and Steps

Tasks are the primary operational elements within a playbook, consisting of multiple sequential steps:

  • Parallel Task Execution: Tasks can be executed concurrently, enhancing playbook efficiency.
  • Sequential Step Execution: Steps within a task are executed in order, with each step needing to complete before the next begins.
  • Atomic Operation - Step: The most atomic operation in NoETL is the 'step,' referring to the execution of a plugin module.

Events and Commands

Events and commands drive the operation of NoETL:

  • Commands: Trigger the execution of step in the task.
  • Events: Published upon step and task completion, signaling its end.

This model maintains a decoupled and fault-tolerant playbook execution.

NATS Communication Subjects

Subjects in NoETL provide contextual information:

  • Command Subjects: command.<plugin_name>.<workflow_id>.<step_id>
  • Event Subjects: event.<plugin_name>.<workflow_id>.<step_id>

Mini-Plugin Architecture

NoETL includes several service plugins:

  • Command API Plugin: Manages playbook, plugin, and command reception and registration.
  • Event API Plugin: Tracks events and manages event data.
  • Dispatcher API Plugin: Responsible for dispatch actions, task queue management, and command execution.

These plugins communicate using NATS messaging, driven by YAML playbooks specifying task sequences. The Kubernetes environment serves as the execution platform for these plugins.

Conclusion

NoETL's architecture, embodying scalability, robustness, and fault-tolerance, is key to a reliable workflow execution system. Its distributed nature, combined with the efficiency of the plugin system and the resilience of its components, makes NoETL a great solution for complex workflow automation.

Prerequisites

  • Python 3.11 or later
  • pip
  • Docker Desktop's Kubernetes cluster, for local development

Installation

To install NoETL use pip:

pip install noetl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noetl-0.1.10.tar.gz (17.3 kB view hashes)

Uploaded Source

Built Distribution

noetl-0.1.10-py3-none-any.whl (28.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page