Skip to main content

Lightweight framework for building data and ML workflows with class-based Python syntax

Project description

AXL Workflows Logo

CI PyPI Python

AXL Workflows (axl) is a lightweight framework for building data and ML workflows with a class-based Python syntax. Build a workflow once, then run it locally or on Argo/Kubeflow:

  • Local runtime → fast iteration on your machine.
  • Argo Workflows YAML → run on Kubernetes; compatible with Kubeflow Pipelines (KFP) environments.

Write once → run anywhere (locally or Argo/Kubeflow in production).


🚀 Quick Start

# Install
pip install axl-workflows

# Or with uv
uv pip install axl-workflows

# Create your first workflow
axl --help

✨ Key Features

  • Class-based DSL: Define workflows as Python classes, with steps as methods and a graph() to wire them.

  • Simple params: Treat parameters as a normal step that returns a Python object (e.g., a Pydantic model or dict). No special Param/Artifact classes.

  • IO Handlers: Steps return plain Python objects; axl persists/loads them via an io_handler (default: pickle).

    • Per-step override (@step(io_handler=...))
    • Input modes: receive objects by default or file paths with input_mode="path".
  • Intermediate Representation (IR): Backend-agnostic DAG model (nodes, edges, resources, IO metadata).

  • Multiple backends:

    • Local runtime → develop and iterate quickly.
    • Argo/KFP → YAML generation for production pipelines.
  • Unified runner image: One container executes steps locally and in Argo pods.

  • Resource & retry hints: Declare CPU, memory, caching, retries, and conditions at the step level.

  • CLI tools: Compile, validate, run locally, or render DAGs.


📦 Example Workflow (params as a step, with Pydantic)

# examples/churn_workflow.py
from axl import Workflow, step
from pydantic import BaseModel

# Parameters are just a normal step output (typed with Pydantic for convenience).
class TrainParams(BaseModel):
    seed: int = 42
    input_path: str = "data/raw.csv"

class ChurnTrain(Workflow):
    # Workflow configuration via class attributes
    name = "churn-train"
    image = "ghcr.io/you/axl-runner:0.1.0"
    io_handler = "pickle"

    @step
    def params(self) -> TrainParams:
        # Use defaults here; optionally read from YAML/env if you prefer.
        return TrainParams()

    @step  # default io_handler = pickle
    def preprocess(self, p: TrainParams):
        import pandas as pd
        df = pd.read_csv(p.input_path)
        # ... feature engineering ...
        return df  # persisted via pickle (default)

    @step
    def train(self, features, p: TrainParams):
        from sklearn.ensemble import RandomForestClassifier
        import numpy as np
        y = (features.sum(axis=1) > features.sum(axis=1).median()).astype(int)
        X = features.select_dtypes(include=[np.number]).fillna(0)
        model = RandomForestClassifier(n_estimators=50, random_state=p.seed).fit(X, y)
        return model  # persisted via pickle

    @step
    def evaluate(self, model) -> float:
        # pretend evaluation
        return 0.9123

    def graph(self):
        p = self.params()
        feats = self.preprocess(p)
        model = self.train(feats, p)
        return self.evaluate(model)

Variations

  • Receive a file path instead of an object:

    from pathlib import Path
    
    @step(input_mode={"features": "path"})
    def profile(self, features: Path) -> dict:
        return {"bytes": Path(features).stat().st_size}
    
  • Override the io handler (e.g., Parquet for DataFrames):

    from axl.io.parquet_io import parquet_io_handler
    
    @step(io_handler=parquet_io_handler)
    def preprocess(self, p: TrainParams):
        import pandas as pd
        return pd.read_csv(p.input_path)  # saved as .parquet; downstream gets a DataFrame
    

🛠 CLI

# Compile to Argo YAML
axl compile -m examples/churn_workflow.py:ChurnTrain --target argo --out churn.yaml

# Compile to Dagster job (Python module output)
axl compile -m examples/churn_workflow.py:ChurnTrain --target dagster --out dagster_job.py

# Run locally
axl run local -m examples/churn_workflow.py:ChurnTrain

# Validate workflow definition
axl validate -m examples/churn_workflow.py:ChurnTrain

# Render DAG graph
axl render -m examples/churn_workflow.py:ChurnTrain --out dag.png

📐 Architecture

  1. Authoring Layer

    • Python DSL: @step decorator, Workflow base class
    • Params are a normal step (often a Pydantic model)
    • Configuration via class attributes (name, image, io_handler)
    • IO handled by io_handlers (default: pickle)
    • Wire dependencies via graph()
  2. IR (Intermediate Representation)

    • Abstract DAG: nodes, edges, inputs/outputs, resources, retry policies, IO metadata
  3. Compilers

    • Argo: generates Argo Workflow YAML and run at Argo Workflows
    • Kubeflow: Compile to pipelines YAML and run it on Kubeflow pipelines
  4. Runtime

    • Unified runner image (axl-runner) executes steps
    • Handles env (via uv), IO handler save/load, logging, retries
  5. CLI

    • Single interface for compile, run, validate, render

📂 Project Structure

axl/
  core/          # DSL: decorators, base classes, typing
  io/            # io_handlers (pickle default; parquet/npy/torch optional)
  ir/            # Intermediate Representation (nodes, edges, workflows)
  compiler/      # Backend compilers (Argo, Kubeflow)
  runtime/       # Runner container + IO + env setup (uv)
  cli.py         # CLI entrypoint
examples/
  churn_workflow.py
tests/
  test_core.py   # Tests for DSL components
  test_ir.py     # Tests for IR components
pyproject.toml
README.md

🎯 Why AXL Workflows?

  • Local development is fast and simple.

  • Kubeflow Pipelines/Argo is production-grade is production‑grade but YAML is verbose and may harder to getting started.

  • axl bridges the gap:

    • Simple, class-based DSL
    • Params as a normal step
    • IO handlers for painless object ↔ file persistence
    • Backend-agnostic IR
    • Compile once, run anywhere

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axl_workflows-0.2.0.tar.gz (50.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

axl_workflows-0.2.0-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file axl_workflows-0.2.0.tar.gz.

File metadata

  • Download URL: axl_workflows-0.2.0.tar.gz
  • Upload date:
  • Size: 50.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for axl_workflows-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f2c8a5a3a8b56f45b102746d4bdafcadcfe5e206d22dcd7d7956759c405cba48
MD5 5c2cac0aee3c0b1b784c08d6dc5edfab
BLAKE2b-256 c46ca214c1d02a4ef379b48a6d27e59c12b1615cebc7207ef0ce43f99b9d8603

See more details on using hashes here.

Provenance

The following attestation bundles were made for axl_workflows-0.2.0.tar.gz:

Publisher: release.yml on pedrospinosa/axl-workflows

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file axl_workflows-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: axl_workflows-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for axl_workflows-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8770de0374d17fc3a214a9260f7c905e85622c86845ac0257559517ba495bc40
MD5 77232b73d51bf35cf98b4e0a878d9a38
BLAKE2b-256 c578f68f5fafa11b447104b7dec5c02dd0c02cbfa1d3eea9c6dd912624024fcb

See more details on using hashes here.

Provenance

The following attestation bundles were made for axl_workflows-0.2.0-py3-none-any.whl:

Publisher: release.yml on pedrospinosa/axl-workflows

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page