Decorator-based workflow documentation for Python scripts and Jupyter notebooks
Project description
dFlow
Decorator-based workflow documentation for Python scripts and Jupyter notebooks. Annotate your analysis with Step markers — dFlow statically extracts the structure (no code execution) and exports it to a standalone HTML page you can share with collaborators.
Installation
pip install dflow
To also install HTML export support:
pip install dflow[export] # adds sphinx + sphinx-dflow-ext
For development:
git clone https://github.com/ddpoe/dFlow.git
cd dFlow
poetry install --with export
Notebook Quickstart
This is the primary use case: you have a Jupyter notebook with an analysis and you want to produce a clean, shareable HTML document describing the whole pipeline.
1. Write your helper functions with @task
Mark reusable functions with @task. These can live in a .py helper module or in a notebook cell — dFlow scans both:
# helpers.py (or a notebook cell — either works)
from dflow.core.decorators import task, Step
@task(
purpose="Load and validate 10X h5ad file",
inputs="Path to h5ad file",
outputs="AnnData object with raw counts",
)
def load_data(path: str):
adata = sc.read_h5ad(path)
assert adata.n_obs > 0, "Empty dataset"
return adata
@task(
purpose="Dimensionality reduction via PCA + UMAP",
inputs="Filtered AnnData",
outputs="AnnData with UMAP coordinates in .obsm",
critical="1-2 minutes on large datasets",
)
def reduce_dims(adata):
口 = Step(step_num=1, name="PCA", purpose="Principal component analysis")
sc.tl.pca(adata)
口 = Step(step_num=2, name="Neighbors", purpose="Build kNN graph")
sc.pp.neighbors(adata)
口 = Step(step_num=3, name="UMAP", purpose="Compute UMAP embedding")
sc.tl.umap(adata)
return adata
Tasks can have their own Step markers inside them. When a workflow calls this task via AutoStep, dFlow resolves those internal steps as sub-steps (see below).
2. Annotate your notebook
The workflow() declaration and all Step / AutoStep markers go in one cell. Use AutoStep when calling a @task — dFlow pulls in its docs automatically. Use Step for inline code:
from dflow import workflow, Step, AutoStep
from helpers import load_data, reduce_dims
口 = workflow(name="scrna_pipeline", purpose="Single-cell RNA-seq analysis")
# ── Step 1: Load Data ──
口 = AutoStep(step_num=1)
adata = load_data("data.h5ad")
# ── Step 2: Quality Control ──
口 = Step(step_num=2, name="Quality control",
purpose="Filter low-quality cells and genes",
critical="Removes 20-30% of cells")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# ── Step 3: Normalize ──
口 = Step(step_num=3, name="Normalize",
purpose="Log-normalize and find highly variable genes",
outputs="Normalized AnnData with HVG annotations")
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
# ── Step 4: Reduce Dimensions ──
口 = AutoStep(step_num=4)
adata = reduce_dims(adata)
About
口: The CJK character 口 (mouth/opening) is used as a visual marker — it makes step annotations stand out from real data assignments. It's purely a convention; any variable name works, or you can omit the assignment entirely and just callStep(...)/AutoStep(...)as bare statements.
3. Cross-module step resolution
When dflow build scans this project, it sees that reduce_dims (called at step 4 via AutoStep) has internal steps 1, 2, 3. During assembly, those become sub-steps of step 4:
Step 1: Load Data ← from @task on load_data
Step 2: Quality control ← inline Step
Step 3: Normalize ← inline Step
Step 4: Reduce Dimensions ← from @task on reduce_dims
Step 4.1: PCA ← resolved from reduce_dims Step 1
Step 4.2: Neighbors ← resolved from reduce_dims Step 2
Step 4.3: UMAP ← resolved from reduce_dims Step 3
This works across files — the task can be in helpers.py, another notebook, or the same notebook. dFlow resolves everything statically from the database.
4. Build and export
dflow init # Create .dflow/ directory and database
dflow build . # Scan notebook + helpers.py → populate database
dflow export -o docs/ # Generate HTML documentation
5. What you get
dflow export produces a self-contained HTML site in docs/:
docs/
├── index.html # Landing page listing all workflows
├── scrna_pipeline.html # Your workflow page
├── _static/ # CSS, JS assets
└── ...
Open docs/index.html in a browser. The workflow page shows:
- Workflow title and purpose — from the
workflow()call - Step-by-step outline — each
Steprendered with its name, purpose, inputs, outputs, and warnings - Resolved AutoStep details — purpose/inputs/outputs pulled from
@taskdecorators, with internal steps expanded as sub-steps (4.1, 4.2, …) - Mermaid diagram — auto-generated flowchart showing the step sequence
- Coverage links — if any
@test_workflowtests reference this workflow, they appear here
This is rendered by sphinx-dflow-ext, which reads the .dflow/workflow.db database directly.
Step reference
| Parameter | Required | Description |
|---|---|---|
step_num |
Yes | Integer for major steps, float for sub-steps in loops (e.g. 1.1) |
name |
Yes (Step only) |
Short label — AutoStep pulls this from the @task |
purpose |
Yes (Step only) |
What this step accomplishes — AutoStep pulls this from the @task |
inputs |
No | What goes in |
outputs |
No | What comes out |
critical |
No | Warnings (long runtime, data loss, etc.) |
How It Works
- You annotate code with
Step()/AutoStep()markers (notebooks) or@workflow/@taskdecorators (modules) dflow buildstatically parses your code via AST — no execution — and stores the structure in.dflow/workflow.db(SQLite)dflow exportgenerates Sphinx RST stubs referencing the database, runssphinx-build, and produces standalone HTML
The database is the single source of truth — downstream tools (Sphinx, Cortex, etc.) read it directly.
Python Modules
For .py files, use @workflow and @task decorators with Step() / AutoStep() markers:
from dflow.core.decorators import workflow, task, Step, AutoStep
@task(purpose="Load and validate data from h5ad file")
def load_data(path: str):
adata = sc.read_h5ad(path)
return adata
@task(purpose="Dimensionality reduction via PCA + UMAP")
def reduce_dims(adata):
sc.tl.pca(adata)
sc.tl.umap(adata)
return adata
@workflow(purpose="Single-cell RNA-seq analysis pipeline")
def run_pipeline(data_path: str):
口 = AutoStep(step_num=1)
adata = load_data(data_path)
口 = Step(step_num=2, name="Filter", purpose="Remove low-quality cells")
adata = adata[adata.obs["n_genes"] > 200]
口 = AutoStep(step_num=3)
adata = reduce_dims(adata)
Testing
Use @test_workflow and @test_suite to annotate test functions. dFlow scans these just like production workflows — extracting steps, purpose, and coverage links — but stores them with role="test" so documentation tooling can distinguish tests from production code.
@test_workflow
Decorator for individual test functions. Same structure as @workflow but adds an optional covers parameter linking the test to the production functions it validates:
from dflow.core.decorators import test_workflow, Step
@test_workflow(
purpose="Execute preprocess → filter → label via CLI and verify lineage chain",
covers=["pm.snakemake_gen.generate_snakefile"],
)
def test_three_step_pipeline(seeded_project):
口 = Step(step_num=1, name="Run preprocess",
purpose="Register and complete preprocess as the root step",
outputs="Run ID for preprocess",
critical="NOT IMPLEMENTED")
# ... test body ...
口 = Step(step_num=2, name="Run filter_cells",
purpose="Register filter_cells with parent=preprocess",
inputs="preprocess run ID")
# ... test body ...
Parameters:
| Parameter | Required | Description |
|---|---|---|
purpose |
Yes | What this test validates (positional) |
covers |
No | List of dotpaths to production functions this test covers, e.g. ["pm.database.get_engine"] (keyword-only) |
inputs |
No | Description of test inputs |
outputs |
No | Description of expected outputs |
critical |
No | Warnings (e.g. "NOT IMPLEMENTED" for skeleton tests) |
The covers list creates CoverageLink rows in the database. These are queried at Sphinx build time by sphinx_dflow_ext to inject test-coverage tables into API docs.
Note:
test_workflow.__test__is set toFalseso pytest won't try to collect the decorator itself as a test.
@test_suite
Class-level decorator for grouping related test methods. Purpose is taken from the class docstring:
from dflow.core.decorators import test_suite, test_workflow, Step
@test_suite(covers=["pm.database.get_engine", "pm.database.init_db"])
class TestDatabaseSetup:
"""Verify database initialization and engine creation."""
@test_workflow(purpose="Engine connects to correct SQLite file")
def test_engine_path(self, tmp_path):
口 = Step(step_num=1, name="Create engine", purpose="Call get_engine()")
# ...
@test_workflow(
purpose="Re-init clears stale data",
covers=["pm.database.reset_db"], # additional function-level covers
)
def test_reinit(self, tmp_path):
口 = Step(step_num=1, name="Reset", purpose="Call reset_db()")
# ...
Key behaviors:
- Class-level
coverscreates class-scopedCoverageLinkrows (applies to all methods) - Individual
@test_workflowmethods can add their owncoverson top of the class-level list (function-scoped rows) - Methods without
@test_workfloware still discovered by theTestScannerasrole="test"functions withclass_idset
Decorator Reference
| Decorator / Marker | Context | Purpose |
|---|---|---|
@workflow(purpose=...) |
.py files |
Marks a top-level orchestration function |
@task(purpose=...) |
.py files |
Marks a reusable unit of work |
@test_workflow(purpose=..., covers=[...]) |
.py test files |
Marks a test function with optional coverage links |
@test_suite(covers=[...]) |
.py test files |
Groups test methods in a class (purpose from docstring) |
workflow(name=..., purpose=...) |
Notebooks | Declares a workflow (plain function call) |
Step(step_num, name, purpose) |
Both | Inline step with explicit metadata |
AutoStep(step_num) |
Both | Step that inherits docs from the next function call |
Optional parameters on @workflow, @task, @test_workflow, and Step: inputs, outputs, critical.
Step Numbering
- Major steps: integers (
1,2,3) — sequential top-level operations - Minor steps: floats (
1.1,1.2) — sub-operations, use inside loops
CLI
dflow init # Initialize .dflow/ directory
dflow build [paths...] # Scan + resolve references (the main command)
dflow list # List discovered workflows
dflow export [workflows...] -o . # Export to HTML (requires sphinx-dflow-ext)
dflow scan [paths...] # Scan only (no resolve step)
dflow assemble # Resolve AutoStep references only
dflow validate [files...] # Validate annotations
Common flags: -v (verbose), -d (debug), --strict, -r (project root).
Database
All annotation data lives in .dflow/workflow.db (SQLite). Key tables:
| Table | Contents |
|---|---|
modules |
Scanned source files |
functions |
Decorated functions with purpose, inputs, outputs |
steps |
Step/AutoStep markers within functions |
workflow_entries |
Top-level @workflow entry points |
classes |
@test_suite class metadata |
coverage_links |
covers=[...] links from tests to production functions |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file document_workflow-0.1.0.tar.gz.
File metadata
- Download URL: document_workflow-0.1.0.tar.gz
- Upload date:
- Size: 79.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38024dcd9c00c7b40e141c43279f14a6a624ee005b766baa0ccac668878793d6
|
|
| MD5 |
0011cb9f2fc04c1d387a3936ac9edc69
|
|
| BLAKE2b-256 |
8698011265657e9be70b904fd2d5e6f216da9b073cf205dfdb52cae338be9d8c
|
File details
Details for the file document_workflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: document_workflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 95.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a9ba80dd504b7c10d968d3e573d3bdc405257eb6a47c4cc98670674a9c4b483
|
|
| MD5 |
877714f2b4c2febcbd3973ec7a50f0a5
|
|
| BLAKE2b-256 |
a8466cd6f710455e27360001d8fe9eabb93d248ab3a46c78bc43df8b17463c54
|