A code-to-graph-to-memory profiler for Dask: know which chunk killed a worker and what source line produced it.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Polymood

These details have not been verified by PyPI

Project description

DaskGenie logo

DaskGenie

A memory profiler and live dashboard for Dask that ties a worker's memory back to the line of your code that caused it.

DaskGenie fuses memray-deep allocation tracing with Dask's task graph and streams it to a real-time dashboard. When a worker dies from an out-of-memory error, you can see the suspect task, the chunk it was holding, and the exact source line that allocated the array that killed it — not just "a worker disappeared". It works on dask.distributed and on the local (threaded / processes / synchronous) schedulers, across dask.array, dask.dataframe, dask.bag, dask.delayed, and xarray on Zarr/NetCDF.

Features

Source attribution. Maps every Dask task-graph layer back to the user source line that built it — the call site in your code, never into dask/numpy/xarray internals.
Deep memory (memray, as a library). Opt-in per-run allocation tracing, epoch-rotated and folded to the first line of your code, so the dashboard shows job.py:42 build = 12.8 GB and a per-worker flamegraph / memray-style tree — you never touch a capture file.
Worker-death post-mortem. A scheduler plugin records the tasks in flight when a worker vanishes; the collector joins in the chunk metadata and the allocation lines at the high-water mark to answer which chunk, which line.
Real-time dashboard. A Next.js app streaming over WebSocket: live worker table, a zoomable task stream (global + per-worker), the whole task-graph DAG, memory-over-time with a click-to-inspect spike explorer, per-layer allocations over time, and the deep flamegraph.
Any scheduler. register() installs worker + scheduler plugins on dask.distributed; LocalProfiler hooks the callback API for the threaded, processes, and synchronous schedulers.
Team-friendly. Every run records the machine (hostname + IP) that opened it, so a shared collector becomes one place to see everyone's runs.
Prometheus + TimescaleDB. The collector exposes /metrics for Grafana and stores to TimescaleDB (or self-contained SQLite for local use / tests).

Installation

pip install daskgenie

Optional extras:

pip install 'daskgenie[deep]'       # memray-backed deep memory profiling
pip install 'daskgenie[collector]'  # the FastAPI collector service (server side)
pip install 'daskgenie[examples]'   # xarray, zarr, pandas, ... for the examples

DaskGenie requires Python 3.11 or newer. Deep memory profiling needs memray (Linux/macOS + CPython); where it isn't importable the profiler silently degrades to lightweight RSS/managed-memory sampling.

Quick start

Bring up the dashboard stack — the collector (API + /metrics + WebSocket), TimescaleDB, and the Next.js dashboard — with Docker:

docker compose up -d --build
# dashboard → http://localhost:3000      collector API → http://localhost:8765

Then point a job at the collector. Each register() opens a run that appears live on the dashboard:

from distributed import Client, LocalCluster
import daskgenie as dg
import daskgenie.client as genie

client = Client(LocalCluster(processes=True))
run_id = genie.register(client, "http://localhost:8765", run_name="nightly ETL", deep=True)

with dg.track() as source_map:
    result = build_pipeline()                 # your dask work
genie.upload_graph("http://localhost:8765", run_id, source_map, collection=result)

result.compute()

Open the run and explore it as it runs:

Overview — live stats, memory-over-time, the hottest allocation line.
Timeline — a large, zoomable memory chart; click any point to see what was running and which source lines were allocating at that instant, plus a stacked per-layer allocation timeline.
Workers — a live, native-Dask-style table (RSS vs limit, CPU, threads, executing/ready).
Task stream — global + per-worker task lanes on a zoomable time axis.
Graph — the real connected task DAG (canvas for large graphs), coloured by layer, death-suspect nodes highlighted; click a node for its source and chunks.
Memory — the deep view: allocation flamegraph, peak bytes by source line, and peak memory by task.
Post-mortem — each worker death: the allocation lines at the high-water mark, the suspect task, its source line, and the chunk it was holding.

Local schedulers

For the non-distributed schedulers (.compute(scheduler="threads"|"processes"| "synchronous"), the default for bare dask collections) use LocalProfiler, which hooks Dask's callback API instead of installing cluster plugins:

import daskgenie as dg

with dg.track() as source_map:
    result = build_pipeline()

with dg.LocalProfiler("http://localhost:8765", run_name="threaded job",
                      source_map=source_map, collection=result, deep=True) as prof:
    result.compute(scheduler="threads")
# prof.run_id shows up in the dashboard like any other run

The post-mortem: which chunk killed this worker

With register() active, a scheduler plugin watches for worker deaths. On a death it records the tasks that were in flight; the collector joins in the chunk metadata the worker had already reported and (with deep=True) the allocation lines at the high-water mark. See it end to end on a LocalCluster whose worker is really OOM-killed:

docker compose up -d --build
uv run --extra demo --extra deep python examples/deep_oom.py

OOM vs. clean shutdown is a heuristic: the scheduler doesn't tell a plugin why a worker left, so DaskGenie flags a suspected OOM only when tasks were in flight at an unexpected removal, and never over-claims.

Configuration

Everything is configured through environment variables:

Variable	Component	Purpose
`DASKGENIE_DSN`	collector	Postgres/TimescaleDB DSN; selects the Timescale backend (default in Docker).
`DASKGENIE_DB`	collector	SQLite path (or `:memory:`) when no DSN is set.
`DASKGENIE_HOST` / `DASKGENIE_PORT`	collector	Bind address (default `127.0.0.1:8765`).
`COLLECTOR_URL`	dashboard	Where the Next.js server proxies `/api` (e.g. `http://collector:8765`).
`NEXT_PUBLIC_COLLECTOR_WS`	dashboard	WebSocket base the browser connects to (default `ws://<host>:8765`).

The register() and LocalProfiler calls take deep=, sample_interval=, flush_interval=, and deep_epoch_seconds= to trade overhead for resolution.

Notes and limitations

Source attribution is strongest for dask.array and dask.delayed. For dask.dataframe (which builds graphs through dask-expr) and xarray, the heavy work runs inside library/C code, so per-line allocations fold to framework frames — the graph, memory, per-layer and flamegraph views still work, but layer→line mapping is sparse there.
memray is Linux/macOS + CPython only, and a single tracker runs per process; deep mode is opt-in and costs roughly 1.5–2× runtime.
On a hard OOM kill the worker's in-flight memray epoch can die before it flushes, so a specific post-mortem's line attribution is best-effort — the Memory tab remains the reliable place to see the culprit line.
The OOM label is a heuristic, not a certainty (see the post-mortem note).
TimescaleDB is the default store in Docker; SQLite is the zero-setup backend used locally and by the test suite.

Examples

Runnable scripts covering the breadth of Dask live in examples/ — distributed OOMs, the deep-memory demo, a big minutes-long pipeline, a self-limiting crash, and one per collection type (dask.delayed, dask.dataframe, dask.bag, xarray on Zarr and NetCDF). See examples/README.md.

Development

uv sync --group dev --extra collector --extra deep
uv run pytest
uv run ruff check .
uv run ruff format .
uv run mypy src/

# collector + dashboard separately (dashboard proxies /api to the collector)
uv run python -m daskgenie.collector --port 8765          # terminal 1
cd web && npm install && npm run dev                      # terminal 2 → :3000

How source attribution works

HighLevelGraph.from_collections is the one classmethod almost every Dask collection operation calls to register a new named graph layer. track() patches it for the duration of the with block: each time a new layer appears, it walks the call stack outward until it finds the first frame that isn't inside dask/distributed/xarray/numpy/zarr/daskgenie or a site-packages install — that's your call site. The deep memory engine reuses the same library-path filter to fold memray stacks to the first user frame. The library-path filter is configurable via track(extra_library_paths=[...]).

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Polymood

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daskgenie-0.1.0.tar.gz (284.0 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

daskgenie-0.1.0-py3-none-any.whl (52.1 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file daskgenie-0.1.0.tar.gz.

File metadata

Download URL: daskgenie-0.1.0.tar.gz
Upload date: Jul 3, 2026
Size: 284.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daskgenie-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3c15f0e5b4328c99c6704207e7be5b107503169223a1a9f68fd03b824fb7686e`
MD5	`b1070aa4374d643aa14f90ac1efb9267`
BLAKE2b-256	`a8d2fe7a08adb40806733ffe78995cf69bfb6f6c7079d758dd275c9f44e405c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for daskgenie-0.1.0.tar.gz:

Publisher: workflow-pypi.yml on polymood/DaskGenie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: daskgenie-0.1.0.tar.gz
- Subject digest: 3c15f0e5b4328c99c6704207e7be5b107503169223a1a9f68fd03b824fb7686e
- Sigstore transparency entry: 2062206542
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: polymood/DaskGenie@26df083d832d8cb3d41cba29287e33a6691729cc
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/polymood
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow-pypi.yml@26df083d832d8cb3d41cba29287e33a6691729cc
- Trigger Event: push

File details

Details for the file daskgenie-0.1.0-py3-none-any.whl.

File metadata

Download URL: daskgenie-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 52.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daskgenie-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d34954a8d2757fc7ad9a468bb03215ce917082bac79a30f4472f2f69828dfbe5`
MD5	`630e9582efa569a32d9b96abf8b4f33b`
BLAKE2b-256	`46346adaa5b28253376e86f788aa5cc6f607746a9a5cc0b85e86e9c8ad0b16de`

See more details on using hashes here.

Provenance

The following attestation bundles were made for daskgenie-0.1.0-py3-none-any.whl:

Publisher: workflow-pypi.yml on polymood/DaskGenie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: daskgenie-0.1.0-py3-none-any.whl
- Subject digest: d34954a8d2757fc7ad9a468bb03215ce917082bac79a30f4472f2f69828dfbe5
- Sigstore transparency entry: 2062207221
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: polymood/DaskGenie@26df083d832d8cb3d41cba29287e33a6691729cc
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/polymood
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow-pypi.yml@26df083d832d8cb3d41cba29287e33a6691729cc
- Trigger Event: push

daskgenie 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

DaskGenie

Features

Installation

Quick start

Local schedulers

The post-mortem: which chunk killed this worker

Configuration

Notes and limitations

Examples

Development

How source attribution works

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance