Skip to main content

Decorator-based workflow documentation for Python scripts and Jupyter notebooks

Project description

dFlow

Decorator-based workflow documentation for Python scripts and Jupyter notebooks. Annotate your analysis with Step markers — dFlow statically extracts the structure (no code execution) and exports it to a standalone HTML page you can share with collaborators.

Installation

pip install dflow

To also install HTML export support:

pip install dflow[export]   # adds sphinx + sphinx-dflow-ext

For development:

git clone https://github.com/ddpoe/dFlow.git
cd dFlow
poetry install --with export

Notebook Quickstart

This is the primary use case: you have a Jupyter notebook with an analysis and you want to produce a clean, shareable HTML document describing the whole pipeline.

1. Write your helper functions with @task

Mark reusable functions with @task. These can live in a .py helper module or in a notebook cell — dFlow scans both:

# helpers.py  (or a notebook cell — either works)
from dflow.core.decorators import task, Step

@task(
    purpose="Load and validate 10X h5ad file",
    inputs="Path to h5ad file",
    outputs="AnnData object with raw counts",
)
def load_data(path: str):
    adata = sc.read_h5ad(path)
    assert adata.n_obs > 0, "Empty dataset"
    return adata

@task(
    purpose="Dimensionality reduction via PCA + UMAP",
    inputs="Filtered AnnData",
    outputs="AnnData with UMAP coordinates in .obsm",
    critical="1-2 minutes on large datasets",
)
def reduce_dims(adata):
     = Step(step_num=1, name="PCA", purpose="Principal component analysis")
    sc.tl.pca(adata)

     = Step(step_num=2, name="Neighbors", purpose="Build kNN graph")
    sc.pp.neighbors(adata)

     = Step(step_num=3, name="UMAP", purpose="Compute UMAP embedding")
    sc.tl.umap(adata)
    return adata

Tasks can have their own Step markers inside them. When a workflow calls this task via AutoStep, dFlow resolves those internal steps as sub-steps (see below).

2. Annotate your notebook

The workflow() declaration and all Step / AutoStep markers go in one cell. Use AutoStep when calling a @task — dFlow pulls in its docs automatically. Use Step for inline code:

from dflow import workflow, Step, AutoStep
from helpers import load_data, reduce_dims

 = workflow(name="scrna_pipeline", purpose="Single-cell RNA-seq analysis")

# ── Step 1: Load Data ──
 = AutoStep(step_num=1)
adata = load_data("data.h5ad")

# ── Step 2: Quality Control ──
 = Step(step_num=2, name="Quality control",
          purpose="Filter low-quality cells and genes",
          critical="Removes 20-30% of cells")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

# ── Step 3: Normalize ──
 = Step(step_num=3, name="Normalize",
          purpose="Log-normalize and find highly variable genes",
          outputs="Normalized AnnData with HVG annotations")
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)

# ── Step 4: Reduce Dimensions ──
 = AutoStep(step_num=4)
adata = reduce_dims(adata)

About : The CJK character 口 (mouth/opening) is used as a visual marker — it makes step annotations stand out from real data assignments. It's purely a convention; any variable name works, or you can omit the assignment entirely and just call Step(...) / AutoStep(...) as bare statements.

3. Cross-module step resolution

When dflow build scans this project, it sees that reduce_dims (called at step 4 via AutoStep) has internal steps 1, 2, 3. During assembly, those become sub-steps of step 4:

Step 1: Load Data              ← from @task on load_data
Step 2: Quality control        ← inline Step
Step 3: Normalize              ← inline Step
Step 4: Reduce Dimensions      ← from @task on reduce_dims
  Step 4.1: PCA                ← resolved from reduce_dims Step 1
  Step 4.2: Neighbors          ← resolved from reduce_dims Step 2
  Step 4.3: UMAP               ← resolved from reduce_dims Step 3

This works across files — the task can be in helpers.py, another notebook, or the same notebook. dFlow resolves everything statically from the database.

4. Build and export

dflow init                          # Create .dflow/ directory and database
dflow build .                       # Scan notebook + helpers.py → populate database
dflow export -o docs/               # Generate HTML documentation

5. What you get

dflow export produces a self-contained HTML site in docs/:

docs/
├── index.html                  # Landing page listing all workflows
├── scrna_pipeline.html         # Your workflow page
├── _static/                    # CSS, JS assets
└── ...

Open docs/index.html in a browser. The workflow page shows:

  • Workflow title and purpose — from the workflow() call
  • Step-by-step outline — each Step rendered with its name, purpose, inputs, outputs, and warnings
  • Resolved AutoStep details — purpose/inputs/outputs pulled from @task decorators, with internal steps expanded as sub-steps (4.1, 4.2, …)
  • Mermaid diagram — auto-generated flowchart showing the step sequence
  • Coverage links — if any @test_workflow tests reference this workflow, they appear here

This is rendered by sphinx-dflow-ext, which reads the .dflow/workflow.db database directly.

Step reference

Parameter Required Description
step_num Yes Integer for major steps, float for sub-steps in loops (e.g. 1.1)
name Yes (Step only) Short label — AutoStep pulls this from the @task
purpose Yes (Step only) What this step accomplishes — AutoStep pulls this from the @task
inputs No What goes in
outputs No What comes out
critical No Warnings (long runtime, data loss, etc.)

How It Works

  1. You annotate code with Step() / AutoStep() markers (notebooks) or @workflow / @task decorators (modules)
  2. dflow build statically parses your code via AST — no execution — and stores the structure in .dflow/workflow.db (SQLite)
  3. dflow export generates Sphinx RST stubs referencing the database, runs sphinx-build, and produces standalone HTML

The database is the single source of truth — downstream tools (Sphinx, Cortex, etc.) read it directly.

Python Modules

For .py files, use @workflow and @task decorators with Step() / AutoStep() markers:

from dflow.core.decorators import workflow, task, Step, AutoStep

@task(purpose="Load and validate data from h5ad file")
def load_data(path: str):
    adata = sc.read_h5ad(path)
    return adata

@task(purpose="Dimensionality reduction via PCA + UMAP")
def reduce_dims(adata):
    sc.tl.pca(adata)
    sc.tl.umap(adata)
    return adata

@workflow(purpose="Single-cell RNA-seq analysis pipeline")
def run_pipeline(data_path: str):
     = AutoStep(step_num=1)
    adata = load_data(data_path)

     = Step(step_num=2, name="Filter", purpose="Remove low-quality cells")
    adata = adata[adata.obs["n_genes"] > 200]

     = AutoStep(step_num=3)
    adata = reduce_dims(adata)

Testing

Use @test_workflow and @test_suite to annotate test functions. dFlow scans these just like production workflows — extracting steps, purpose, and coverage links — but stores them with role="test" so documentation tooling can distinguish tests from production code.

@test_workflow

Decorator for individual test functions. Same structure as @workflow but adds an optional covers parameter linking the test to the production functions it validates:

from dflow.core.decorators import test_workflow, Step

@test_workflow(
    purpose="Execute preprocess → filter → label via CLI and verify lineage chain",
    covers=["pm.snakemake_gen.generate_snakefile"],
)
def test_three_step_pipeline(seeded_project):
     = Step(step_num=1, name="Run preprocess",
             purpose="Register and complete preprocess as the root step",
             outputs="Run ID for preprocess",
             critical="NOT IMPLEMENTED")
    # ... test body ...

     = Step(step_num=2, name="Run filter_cells",
             purpose="Register filter_cells with parent=preprocess",
             inputs="preprocess run ID")
    # ... test body ...

Parameters:

Parameter Required Description
purpose Yes What this test validates (positional)
covers No List of dotpaths to production functions this test covers, e.g. ["pm.database.get_engine"] (keyword-only)
inputs No Description of test inputs
outputs No Description of expected outputs
critical No Warnings (e.g. "NOT IMPLEMENTED" for skeleton tests)

The covers list creates CoverageLink rows in the database. These are queried at Sphinx build time by sphinx_dflow_ext to inject test-coverage tables into API docs.

Note: test_workflow.__test__ is set to False so pytest won't try to collect the decorator itself as a test.

@test_suite

Class-level decorator for grouping related test methods. Purpose is taken from the class docstring:

from dflow.core.decorators import test_suite, test_workflow, Step

@test_suite(covers=["pm.database.get_engine", "pm.database.init_db"])
class TestDatabaseSetup:
    """Verify database initialization and engine creation."""

    @test_workflow(purpose="Engine connects to correct SQLite file")
    def test_engine_path(self, tmp_path):
         = Step(step_num=1, name="Create engine", purpose="Call get_engine()")
        # ...

    @test_workflow(
        purpose="Re-init clears stale data",
        covers=["pm.database.reset_db"],  # additional function-level covers
    )
    def test_reinit(self, tmp_path):
         = Step(step_num=1, name="Reset", purpose="Call reset_db()")
        # ...

Key behaviors:

  • Class-level covers creates class-scoped CoverageLink rows (applies to all methods)
  • Individual @test_workflow methods can add their own covers on top of the class-level list (function-scoped rows)
  • Methods without @test_workflow are still discovered by the TestScanner as role="test" functions with class_id set

Decorator Reference

Decorator / Marker Context Purpose
@workflow(purpose=...) .py files Marks a top-level orchestration function
@task(purpose=...) .py files Marks a reusable unit of work
@test_workflow(purpose=..., covers=[...]) .py test files Marks a test function with optional coverage links
@test_suite(covers=[...]) .py test files Groups test methods in a class (purpose from docstring)
workflow(name=..., purpose=...) Notebooks Declares a workflow (plain function call)
Step(step_num, name, purpose) Both Inline step with explicit metadata
AutoStep(step_num) Both Step that inherits docs from the next function call

Optional parameters on @workflow, @task, @test_workflow, and Step: inputs, outputs, critical.

Step Numbering

  • Major steps: integers (1, 2, 3) — sequential top-level operations
  • Minor steps: floats (1.1, 1.2) — sub-operations, use inside loops

CLI

dflow init                        # Initialize .dflow/ directory
dflow build [paths...]            # Scan + resolve references (the main command)
dflow list                        # List discovered workflows
dflow export [workflows...] -o .  # Export to HTML (requires sphinx-dflow-ext)
dflow scan [paths...]             # Scan only (no resolve step)
dflow assemble                    # Resolve AutoStep references only
dflow validate [files...]         # Validate annotations

Common flags: -v (verbose), -d (debug), --strict, -r (project root).

Database

All annotation data lives in .dflow/workflow.db (SQLite). Key tables:

Table Contents
modules Scanned source files
functions Decorated functions with purpose, inputs, outputs
steps Step/AutoStep markers within functions
workflow_entries Top-level @workflow entry points
classes @test_suite class metadata
coverage_links covers=[...] links from tests to production functions

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_workflow-0.1.0.tar.gz (79.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_workflow-0.1.0-py3-none-any.whl (95.8 kB view details)

Uploaded Python 3

File details

Details for the file document_workflow-0.1.0.tar.gz.

File metadata

  • Download URL: document_workflow-0.1.0.tar.gz
  • Upload date:
  • Size: 79.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10

File hashes

Hashes for document_workflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 38024dcd9c00c7b40e141c43279f14a6a624ee005b766baa0ccac668878793d6
MD5 0011cb9f2fc04c1d387a3936ac9edc69
BLAKE2b-256 8698011265657e9be70b904fd2d5e6f216da9b073cf205dfdb52cae338be9d8c

See more details on using hashes here.

File details

Details for the file document_workflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: document_workflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 95.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10

File hashes

Hashes for document_workflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a9ba80dd504b7c10d968d3e573d3bdc405257eb6a47c4cc98670674a9c4b483
MD5 877714f2b4c2febcbd3973ec7a50f0a5
BLAKE2b-256 a8466cd6f710455e27360001d8fe9eabb93d248ab3a46c78bc43df8b17463c54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page