Skip to main content

Interactive flamegraph profiling for Metaflow steps — pluggable backends, beautiful cards

Project description

metaflow-profiler

CI PyPI Python License

Add one decorator to any Metaflow step and get an interactive flamegraph card in the UI.


When a Metaflow step is slow or runs out of memory, finding the cause means adding profiling code, re-running locally, deciphering cProfile tables, and correlating a separate top session — all before you can start actually fixing anything.

@profile_card wraps any step with a single decorator. It captures the CPU call tree, memory allocations, and system resource usage, then renders a self-contained interactive card directly in the Metaflow UI — visible even when the step crashes.

Quick start

pip install metaflow-profiler[pyinstrument]
from metaflow import FlowSpec, step
from metaflow_extensions.profiler.plugins.profile_decorator import profile_card

class MyFlow(FlowSpec):

    @profile_card(profiler="pyinstrument")
    @step
    def train(self):
        # ... your heavy computation ...
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    MyFlow()
python flow.py run
python flow.py card view --id profile_card_train

What you get

Stats grid

Duration, sample count, peak/avg CPU, peak/avg memory — plus disk I/O, network, and GPU stats when present. Stat cards appear automatically and hide when zero.

Stats grid and CPU flamegraph header

CPU Flamegraph

Every function call is a coloured block; width represents time spent.

CPU flamegraph — full view

Search highlights matching frames across the whole tree while dimming everything else — useful for tracking down a specific function across multiple call paths.

CPU flamegraph — search: only "run_step" highlighted

Click to zoom into any frame. A breadcrumb trail lets you navigate back up the call stack.

CPU flamegraph — zoomed in with breadcrumb trail visible

Memory Flamegraph

When memray is installed (pip install metaflow-profiler[memray]), a second flamegraph shows which functions allocated the most memory at peak RSS, in MB.

Memory flamegraph — allocation tree by call stack

Resource Timeline + I/O Timeline

Dual-axis time-series charts polled every 500 ms throughout the step.

  • Resource Timeline — CPU % (left axis) and RSS memory in MB (right axis)
  • I/O Timeline — Disk read/write MB/s (left axis) and network recv/sent MB/s (right axis)

Both charts share the same time axis so you can correlate spikes across metrics.

Resource Timeline and I/O Timeline

cProfile backend

The cprofile backend uses Python's built-in profiler — no extra dependencies. It captures every function call so sample counts are much higher. The flamegraph is otherwise identical.

cprofile backend — stats grid and CPU flamegraph

Failed steps

The card renders even when the step raises an exception — it shows the full profile up to the point of failure with a red banner at the top.

Backends

Backend Install Overhead Notes
pyinstrument pip install metaflow-profiler[pyinstrument] ~1% Statistical; recommended
cprofile (built-in) Medium Deterministic; captures every call

Optional extras

Extra Install Adds
memray pip install metaflow-profiler[memray] Memory allocation flamegraph
gpu pip install metaflow-profiler[gpu] GPU utilisation % + GPU memory timeline
all pip install metaflow-profiler[all] Everything above

How it works

@profile_card decorator
    ↓  starts backend in task_pre_step, stops in task_post_step / task_exception
Card renderer (ProfileCard)
    ↓  reads artifact, renders self-contained HTML
Backend registry
    ↓  picks best available backend
Backend implementations (pyinstrument / cprofile)
    ↓  wraps _TimelineCollector (psutil) + _MemoryTracker (memray)
Abstract interface (ProfilerBackend / ProfileData)

No upward imports between layers — enforced by structural tests.

Development

git clone https://github.com/npow/metaflow-profiler
cd metaflow-profiler
pip install -e ".[pyinstrument,dev]"

# Lint + type check
ruff check src/ tests/
mypy src/

# Tests
pytest tests/unit/
pytest tests/structural/ -m structural

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_profiler-0.1.0.tar.gz (980.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflow_profiler-0.1.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file metaflow_profiler-0.1.0.tar.gz.

File metadata

  • Download URL: metaflow_profiler-0.1.0.tar.gz
  • Upload date:
  • Size: 980.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaflow_profiler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 864285ae7f163ab1cd39814e94382f04196b55b08e5a56a6734897f88e89375f
MD5 51cd3f72cd3a86e45c6a6b7ddb87848c
BLAKE2b-256 ceccf38b5cc93d4431d54faf4e51f9358153cec6da1ae7504cd6a7cc2a783267

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_profiler-0.1.0.tar.gz:

Publisher: publish.yml on npow/metaflow-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaflow_profiler-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metaflow_profiler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79c4061959d2565d3465f3cac7e836090597a1d5c7e6656f378327b2f450a579
MD5 0306619697c4948b29d027b78c826dab
BLAKE2b-256 c43a2c83ae564eb9c68a466c4b75cb8bc36b617267b56fbfe81f6d6108561813

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_profiler-0.1.0-py3-none-any.whl:

Publisher: publish.yml on npow/metaflow-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page