Skip to main content

Metaflow extension to compile and deploy flows as Dagster jobs

Project description

metaflow-dagster

CI PyPI License: Apache-2.0 Python 3.10+

Deploy and run Metaflow flows as Dagster jobs.

metaflow-dagster generates a self-contained Dagster definitions file from any Metaflow flow, letting you schedule, monitor, and launch your pipelines through Dagster while keeping all your existing Metaflow code unchanged.

Install

pip install metaflow-dagster

Or from source:

git clone https://github.com/npow/metaflow-dagster.git
cd metaflow-dagster
pip install -e ".[test]"

Quick start

python my_flow.py dagster create dagster_defs.py
dagster dev -f dagster_defs.py

Usage

Generate and run a Dagster job

python my_flow.py dagster create dagster_defs.py
dagster dev -f dagster_defs.py

Or execute directly in Python:

from dagster_defs import MyFlow
result = MyFlow.execute_in_process()

All graph shapes are supported

# Linear
class SimpleFlow(FlowSpec):
    @step
    def start(self):
        self.value = 42
        self.next(self.end)
    @step
    def end(self): pass

# Split/join (branch)
class BranchFlow(FlowSpec):
    @step
    def start(self):
        self.next(self.branch_a, self.branch_b)
    ...

# Foreach fan-out
class ForeachFlow(FlowSpec):
    @step
    def start(self):
        self.items = [1, 2, 3]
        self.next(self.process, foreach="items")
    ...

Parametrised flows

Parameters defined with metaflow.Parameter are forwarded automatically as a typed Dagster Config class on the start op:

python param_flow.py dagster create param_flow_dagster.py

Then pass them via Dagster's run config:

result = ParametrizedFlow.execute_in_process(run_config={
    "ops": {"op_start": {"config": {"greeting": "Hi", "count": 5}}}
})

Step decorators (--with)

Inject Metaflow step decorators at deploy time without modifying the flow source:

# Run every step in a sandbox (e.g. metaflow-sandbox extension)
python my_flow.py dagster create my_flow_dagster.py --with=sandbox

# Multiple decorators are supported
python my_flow.py dagster create my_flow_dagster.py \
  --with=sandbox \
  --with='resources:cpu=4,memory=8000'

Retries and timeouts

@retry and @timeout on any step are picked up automatically. The generated op gets a Dagster RetryPolicy and an op_execution_timeout tag — no extra configuration needed:

class MyFlow(FlowSpec):
    @retry(times=3, minutes_between_retries=2)
    @timeout(seconds=300)
    @step
    def train(self):
        ...

Generates:

@op(retry_policy=RetryPolicy(max_retries=3, delay=120),
    tags={"dagster/op_execution_timeout": "300"})
def op_train(context): ...

Each Dagster retry passes the correct --retry-count to Metaflow so attempt numbering is consistent.

Environment variables

@environment(vars={...}) on a step passes those variables to the metaflow step subprocess:

@environment(vars={"TOKENIZERS_PARALLELISM": "false"})
@step
def embed(self): ...

Project namespace

If the flow uses @project(name=...), the Dagster job name is automatically prefixed:

@project(name="recommendations")
class TrainFlow(FlowSpec): ...
python train_flow.py dagster create out.py
# job name: recommendations_TrainFlow

Workflow timeout

Cap the total wall-clock time for the entire job run:

python my_flow.py dagster create my_flow_dagster.py --workflow-timeout 3600

Attach tags

Metaflow tags are forwarded to every metaflow step subprocess at compile time:

python my_flow.py dagster create my_flow_dagster.py --tag env:prod --tag version:2

Custom job name

python my_flow.py dagster create my_flow_dagster.py --name nightly_pipeline

Configuration

Metadata service and datastore

By default, metaflow-dagster uses whatever metadata and datastore backends are active in your Metaflow environment. The generated file bakes in those settings at creation time so every step subprocess uses the same backend.

To use a remote metadata service or object store, configure them before running dagster create:

python my_flow.py \
  --metadata=service \
  --datastore=s3 \
  dagster create my_flow_dagster.py

Or via environment variables:

export METAFLOW_DEFAULT_METADATA=service
export METAFLOW_DEFAULT_DATASTORE=s3
python my_flow.py dagster create my_flow_dagster.py

Scheduling

If your flow has a @schedule decorator, the generated file includes a ScheduleDefinition automatically. No extra configuration needed.

How it works

metaflow-dagster compiles your Metaflow flow's DAG into a self-contained Dagster definitions file. Each Metaflow step becomes a @op. The generated file:

  • runs each step as a subprocess via the standard metaflow step CLI
  • passes --input-paths correctly for joins and foreach splits
  • emits Metaflow artifact keys and a retrieval snippet to the Dagster UI after each step

Job graph

The compiled DAG is fully visible in Dagster — typed inputs, fan-out branches, and fan-in joins:

Job graph showing split/join structure

Launchpad

Parametrised flows get a typed config schema in the Dagster launchpad, populated from your Metaflow Parameter defaults:

Launchpad with op config schema

Run timeline

Each Metaflow step appears as a Dagster op with real wall-clock timing. Parallel branches run concurrently:

Completed run with Gantt chart

Artifact retrieval

After each step, the op emits the artifact keys and a ready-to-copy retrieval snippet — without loading the values themselves:

Artifact keys and retrieval snippet with copy button

from metaflow import Task
task = Task('BranchingFlow/dagster-d75a08c398a3/start/1')
task['value'].data

Step logs

Every op logs the exact metaflow step CLI command it ran. Flow print() output streams through Dagster's log panel:

Run logs showing Metaflow CLI commands

Development

git clone https://github.com/npow/metaflow-dagster.git
cd metaflow-dagster
pip install -e ".[test]"
pytest -v

The test suite runs real end-to-end: compile → load module → execute_in_process → verify Metaflow artifacts on disk. No mocks.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_dagster-0.3.0.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflow_dagster-0.3.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file metaflow_dagster-0.3.0.tar.gz.

File metadata

  • Download URL: metaflow_dagster-0.3.0.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaflow_dagster-0.3.0.tar.gz
Algorithm Hash digest
SHA256 86feab6210fc6e19c7276183b880bd5da19d115a838b1bd3b90a7b1f4cea5faf
MD5 be79fee9e0653ffa25479a2efc41aaa2
BLAKE2b-256 317f1f7c9ee3a76ec82dc075bb269a4f64fcb0b5e31b4bbeea8ade7433eac972

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_dagster-0.3.0.tar.gz:

Publisher: publish.yml on npow/metaflow-dagster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaflow_dagster-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metaflow_dagster-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49bbb09d866a8bf7b2ecf7183d872ef1caabfc4c0ca6cc7c130b17472cd8e8b1
MD5 d5b58af0dbe2786bbcae5e5be7b6f549
BLAKE2b-256 5bb8c18788f943737cfed4cf7c7f73a5cf67946d16f91ffd8971fe4454e03232

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_dagster-0.3.0-py3-none-any.whl:

Publisher: publish.yml on npow/metaflow-dagster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page