Skip to main content

Make util for applications

Project description

MAKE IT!

GNU Make for applications.

  • Extensible execution model language
  • Expressive and concise task description language
  • Task are not deferred to remote
  • Build-in async support

Rationale

Rethinking of Doit package.

Similar libraries -- what's different

See Luigi, Airflow, doit.

Main concepts

Two main concepts of the library are Task and Artifact. A task depends on artifacts and produces artifacts.

The use of artifacts is twofold. First, artifacts are used to build a graph (DAG) of the execution. On the other side, artifacts' fingerprints (i.e., md5) are used to define weather task should be run or not.

Task definition

The simplest way to define a task is to derive a dataclass class from makeit.DataclassTask and implement method execute, as in the example below.

The class makeit.DataclassTask closely interweaves with the standard dataclasses library, allowing users to benefit from a rich dataclasses support from IDEs.

One implicit dependency is the source code if a task (obtained by inspect.getsource(task.class)), if variable DataclassTask.depends_on_source_code_of_self is set (it is set by default).

import dataclasses
from makeit import File, Dependency, Target, DataclassTask
from pathlib import Path

@dataclasses.dataclass
class Process(DataclassTask):  # derive from DataclassTask -- this adds required and additional functions like dependencies, ...
    input_1: Path | Dependency  # this is a file dependency
    input_2: Path | Dependency  # many dependencies are possible
    many_files: list[Path] | Dependency  # and even list of dependencies allowed

    parameter: float  # this is a parameter -- not a dependency or a target
                      # however, it is used in label() method (via dataclasses method __str__),
                      # therefore it helps to parametrize tasks.
                      # See also method DataclassTask.md5 -- it is helpful for creating unique target names.

    target: File | Target  # many targets are possible as well
    
    target_2: File | Target = None  # this can be initialized in the dataclasses default method __post_init__

    class_variable = 12  # this is a class variable -- makeit disregards it, as dataclasses does.
    
    def __post_init__(self):
        # our task has parameters (parameter: float)
        # -- so it is better to save results into a unique target (file in this example)
        # DataclassTask.md5 is roughly equivalent to md5hash(str(self)).
        self.target_2 = Path('datafolder') / self.md5(".csv")

    def execute(self):
        # Implement task's logic
        text = self.input_1.read_text() + self.input_2.read_text()
        self.target.path.write_text(text)

Tasks execution

The simplest way to execute tasks is to use function execute_on_change:

from makeit.core import execute_on_change

tasks = [...]
execute_on_change(tasks, "makeit.json")

File "makeit.json" serves as a backend. Fingerprints of artifacts are stored there and later runs depend on those.

Execution reporting

There is a build-in logging execution reporter.

Example with TqdmReporter

from makeit.core import ExecutionReporter, TaskEvent
class TqdmReporter(ExecutionReporter):
    def report_task_event(self, event: TaskEvent, task_name: str, reason: str | None):
        pass
    
    def on_dag_start(self, dag_name: str, total_number_of_tasks: int):
        pass

    def on_dag_end(self):
        pass

Using graphviz to plot DAG

from makeit.contrib.graphviz import render_online, create_dot

tasks = [...]

dot = create_dot(tasks)  # graphviz.Digraph can be rendered to pdf, png, ...
render_online(dot, 'https://...')  # ... or simply drawn online

Some remarks:

  • dotted lines connect task with its parameters
  • file artifacts are marked by a folder-shaped box
  • green or red color denote weather file exists or not
  • labels are shortened as much as possible in order to get readable picture

Advanced tasks execution

Multiprocessing and asyncio execution. Prefixes.

Advanced task definition

Registering a type conversion.

Subclass directly from makeit.core.Task and implement all needed abstract methods.

Extra dependencies/targets. Task dependencies.

Typical workflows

Workflow Description Backend role
Make Execute task if targets are older than dependencies -
"Luigi" Execute task if and only if target does not exist -
Make 2.0 Execute task if dependency has changed Fingerprint is stored in backend, i.e. MD5 or modification time
Luigi 2.0 Pickup new files from folder, parse and process Backend remembers which files have been already seen
Always Execute all tasks (in correct order) -

Make

Compare files' timestamps, if dependencies are newer than targets then recalculate the targets.

"Luigi"

Run task if and only if targets are absent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

makeit-2.10.8.tar.gz (18.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page