Skip to main content

Make util for applications

Project description

MAKE IT!

GNU Make for applications.

  • Extensible execution model language
  • Expressive and concise task description language
  • Task are not deferred to remote
  • Build-in async support

Rationale

Rethinking of Doit package.

Similar libraries -- what's different

See Luigi, Airflow, doit.

Main concepts

Two main concepts of the library are Task and Artifact. A task depends on artifacts and produces artifacts.

The use of artifacts is twofold. First, artifacts are used to build a graph (DAG) of the execution. On the other side, artifacts' fingerprints (i.e., md5) are used to define weather task should be run or not.

Task definition

The simplest way to define a task is to derive a dataclass class from makeit.DataclassTask and implement method execute, as in the example below.

The class makeit.DataclassTask closely interweaves with the standard dataclasses library, allowing users to benefit from a rich dataclasses support from IDEs.

One implicit dependency is the source code if a task (obtained by inspect.getsource(task.class)), if variable DataclassTask.depends_on_source_code_of_self is set (it is set by default). However, this is not available in the environments like Jupyter.

import dataclasses
from makeit import File, Dependency, Target, DataclassTask
from pathlib import Path

@dataclasses.dataclass
class Process(DataclassTask):  # derive from DataclassTask -- this adds required and additional functions like dependencies, ...
    input_1: Path | Dependency  # this is a file dependency
    input_2: Path | Dependency  # many dependencies are possible
    many_files: list[Path] | Dependency  # and even list of dependencies allowed

    parameter: float  # this is a parameter -- not a dependency or a target
                      # however, it is used in label() method (via dataclasses method __str__),
                      # therefore it helps to parametrize tasks.
                      # See also method DataclassTask.md5 -- it is helpful for creating unique target names.

    target: File | Target  # many targets are possible as well
    
    target_2: File | Target = None  # this can be initialized in the dataclasses default method __post_init__

    class_variable = 12  # this is a class variable -- makeit disregards it, as dataclasses does.
    
    def __post_init__(self):
        # our task has parameters (parameter: float)
        # -- so it is better to save results into a unique target (file in this example)
        # DataclassTask.md5 is roughly equivalent to md5hash(str(self)).
        self.target_2 = Path('datafolder') / self.md5(".csv")

    def execute(self):
        # Implement task's logic
        text = self.input_1.read_text() + self.input_2.read_text()
        self.target.path.write_text(text)

Tasks execution

The simplest way to execute tasks is to use the function execute_on_change:

from makeit import execute_on_change

tasks = [...]
execute_on_change(tasks, backend="makeit.json")

File backend serves as a storage for execution. Creates one, if it does not exist.

Reporter is used to log DAG execution, default implementation prints out colorized output to stderr. None can be used to make it silent.

Set dag_name if you have several dags in one project, and you want to avoid names clashing.

If parameter capture_only is set, tasks are not executed, only fingerprints are captured and stored in the backend. Can not be set simultaneously with execute_all.

If parameter execute_all is set, all fingerprints are disregarded, everything will be executed. The advantage over execute_always is that all fingerprints are stored in the backend.

Parameter test_targets_fingerprints controls checking of targets fingerprints. If it is set and a target has been modified externally, the task will be executed.

Parameter strict controls tasks execution. If target is not created or dependency does not exist, exception is raised.

Execution reporting

There is a build-in logging execution reporter.

Example with TqdmReporter

from makeit.core import Reporter, TaskEvent


class TqdmReporter(Reporter):
    def report_task_event(self, event: TaskEvent, task_name: str, reason: str | None):
        pass

    def on_dag_start(self, dag_name: str, total_number_of_tasks: int):
        pass

    def on_dag_end(self):
        pass

Using graphviz to plot DAG

from makeit.contrib.graphviz import render_online, create_dot

tasks = [...]

dot = create_dot(tasks)  # graphviz.Digraph can be rendered to pdf, png, ...
render_online(dot, 'https://...')  # ... or simply drawn online

Some remarks:

  • dotted lines connect task with its parameters
  • file artifacts are marked by a folder-shaped box
  • green or red color denote weather file exists or not
  • labels are shortened as much as possible in order to get readable picture

Advanced tasks execution

Use makeit.mp_execute_on_change to run tasks in multiprocessing. This function has almoust the same signature as execute_on_change, however it has additional parameter n_jobs controlling number of worker processes.

Use makeit.aio_execute_on_change to run async tasks (in this case method async def a_execute(self) should be implemented instead). This function has also own specific parameters controlling queue size and number of async tasks.

Advanced task definition

Registering a type conversion.

Subclass directly from makeit.core.Task and implement all needed abstract methods.

Extra dependencies/targets. Task dependencies.

Typical workflows

Workflow Description Backend role
Make Execute task if targets are older than dependencies -
"Luigi" Execute task if and only if target does not exist -
Make 2.0 Execute task if dependency has changed Fingerprint is stored in backend, i.e. MD5 or modification time
Luigi 2.0 Pickup new files from folder, parse and process Backend remembers which files have been already seen
Always Execute all tasks (in correct order) -

Make

Compare files' timestamps, if dependencies are newer than targets then recalculate the targets.

"Luigi"

Run task if and only if targets are absent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

makeit-2.10.11.tar.gz (23.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page