Skip to main content

Backend package for ipyflow's dataflow functionality

Project description

IPyflow

Checked with mypy Code style: black License: BSD3

TL;DR

Precise reactive Python notebooks for Jupyter[Lab]:

  1. pip install ipyflow
  2. Pick Python 3 (ipyflow) from the launcher or kernel selector.
  3. For each cell execution, the (minimal) set of out-of-sync upstream and downstream cells also re-execute, so that executed cells appear as they would when running the notebook from top-to-bottom.

About

IPyflow is a next-generation Python kernel for JupyterLab and Notebook 7 that tracks dataflow relationships between symbols and cells during a given interactive session, thereby making it easier to reason about notebook state. Here is a video of the JupyterCon talk introducing it (and corresponding slides).

If you'd like to skip the elevator pitch and skip straight to installation / activation instructions jump to quick start below; otherwise, keep reading to learn about IPyflow's philosophy and feature set.

Goals

IPyflow provides bolt-on reactivity to Jupyter's default Python kernel, ipykernel. It was was designed with the following goals in mind:

  • Full backwards-compatibility with ipykernel: IPyflow aims to be a drop-in replacement for ipykernel, providing a strict superset of its features.
  • Precise dependency inference: IPyflow understands dependencies between cells beyond just simple variables. For example, IPyflow understands when cell B depends on cell A because of a subscript reference x[0], and is smart enough not to reactively execute cell B when some other part of x, e.g. x[1], changes. As a result, it limits unnecessary re-execution to a bare minimum.
  • Fearless execution: IPyflow attempts to enforce the following invariant: whenever you execute a cell, the resulting output appears as it would if you had performed a "restart + run all" operation. The implication is that you can execute basically any cell in the notebook and trust that It Just WorksTM.

Quick Start

To install, run:

pip install ipyflow

To run an IPyflow kernel, select "Python 3 (ipyflow)" from the list of available kernels in the Launcher tab. Similarly, you can switch to / from IPyflow from an existing notebook by navigating to the "Change kernel" file menu item:

Entrypoint Kernel Switcher

Features

Reactive execution model

IPyflow ships with extensions that bring reactivity to JupyterLab and Notebook 7 by default, similar to execution behavior offered in other notebooks such as Observable, Pluto.jl, and Marimo.

IPyflow's reactivity behaves a little bit differently from the above, however, as it was designed to meet the needs of Jupyter users in particular. When you execute cell C with IPyflow, C's output, the output of the cells C depends on, and the output of the cells that depend on C all appear as they would if the notebook were executed from top to bottom (e.g. via "restart and run-all"). When you select some cell C, all the cells that would re-execute when C is executed have an orange dot next to them, and cells that C depends on but that are up-to-date and will not re-execute have purple dots:

The cell dependency information is persisted to the notebook metadata, so that you can jump to any cell after starting a fresh kernel session, run it, and be confident that the output is what was intended by the notebook author:

Autosave and recovering prior executions

Because IPyflow peeks at runtime state in order to infer dependencies, it needs to keep content of the notebook in sync with the kernel's memory state, even across browser refreshes. As such, IPyflow enables autosave-on-change by default, so that the kernel state, the notebook UI's in-memory state, and the notebook file on disk are all in sync. If you accidentally overwrite a cell's output that you wanted to keep, e.g. during a reactive execution, and autosave overwrites the previous result on disk, fear not! IPyflow provides a library utility called reproduce_cell to recover the input and output of previous cell executions (within a given kernel session):

from ipyflow import reproduce_cell
reproduce_cell(4, lookback=1)  # to reproduce the previous execution of cell 4

Example:

Opting out of reactivity

If you'd like to temporarily opt out of reactive execution, you can use ctrl+shift+enter (on Mac, cmd+shift+enter also works) to only execute the cell in question:

You can also run the magic command %flow mode normal in opt out of the default reactive execution mode (in which case, ctr+shift+enter / cmd+shift+enter will switch from being nonreactive to reactive). To reenable reactive execution as the default, you can run %flow mode reactive:

If you'd like to prevent the default reactive behavior for every new kernel session, you can add this to your IPython profile (default location typically at ~/.ipython/profile_default/ipython_config.py):

c = get_config()
c.ipyflow.exec_mode = "normal"  # defaults to "reactive"

In-order and any-order semantics

IPyflow defaults to in-order semantics, meaning that, if cell B depends on cell A, then A must appear before B in the spatial order of the notebook. IPyflow doesn't prevent previous cells from referencing data created or updated by later cells, but it omits these edges when performing reactive execution.

In-order semantics, though less flexible, have some desirable properties when compared with any-order semantics, as they encourage cleaner and more reproducible notebooks that can more easily be converted to Python scripts later. Now that I may or may not have sold you on in-order semantics, you can enable any-order semantics in IPyflow by running the magic command %flow direction any_order, and reenable the default in-order semantics using %flow direction in_order:

You can also update your IPython profile if you'd like to make any-order semantics the default behavior for new kernel sessions:

c = get_config()
c.ipyflow.flow_direction = "any_order"  # defaults to "in_order"

Execution suggestions and shortcut for resolving inconsistencies

Whenever a cell references updated data, the collapser next to it is given an orange color (similar to the color for dirty cells), and cells that (recursively) depend on it are given a purple collapser color. (An orange input with a purple output just means that the output may be out-of-sync.) When using reactive execution, you usually won't see these, since out-of-sync dependent cells will be rerun automatically, though you may see them if using ctrl+shift+enter to temporarily opt out of reactivity, or if you change which data the cell updates (thereby overwriting previous edges between cells).

If you'd like to let IPyflow fix these up for you, you can press "Space" when in command mode to automatically resolve all stale or dirty cells. This operation may introduce more stale cells, in which case you can continue pressing "Space" until all inconsistencies are resolved, if desired:

Memoization

Cells that reference Python functions and classes, primitives like integers, floats, strings, as well as numpy arrays, pandas dataframes, and containers (lists, dicts, sets, tuples, etc.) thereof can be memoized by IPyflow using the special %%memoize pseudomagic. There's no need to specify the "inputs" to the cell, as IPyflow will infer these automatically. Memoized cells cache their results in-memory (though disk-backed caches are planned for the future), and will retrieve these cached results (rather than re-running the cell) whenever IPyflow detects inputs and cell content identical to that of a previous run:

By default, %%memoize skips all output except potential displayhook output from the last expression in the cell (when applicable). To skip this too, pass --quiet, and to include stdout, stderr, and other rich output, pass --verbose:

IPyWidgets integration

IPyflow's reactive execution engine has built-in support for ipywidgets, allowing widget changes to be propagated across cell boundaries:

This functionality can be combined with the %%memoize magic to provide near real-time rendering of interactive plots across cells:

This functionality can be paired with other extensions like stickyland to build fully reactive dashboards on top of JupyterLab + IPyflow.

Finally, IPyflow also integrates with mercury widgets as well:

State API

IPyflow must understand the underlying execution state at a deep level in order to provide its features. It exposes an API for interacting with some of this state, including a code function for obtaining the code necessary to reconstruct some symbol:

# Cell 1
from ipyflow import code

# Cell 2
x = 0

# Cell 3
y = x + 1

# Cell 4
print(code(y))

# Output:
"""
# Cell 2
x = 0

# Cell 3
y = x + 1
"""

You can also do this at the cell-level as well using the slice() method:

from ipyflow import cells
print(cells(4).slice())

# Output:
"""
# Cell 2
x = 0

# Cell 3
y = x + 1

# Cell 4
print(code(y))
"""

You can also see the cell (1-indexed) and statement (0-indexed) of when a symbol was last updated with the timestamp function:

from ipyflow import timestamp
timestamp(y)
# Timestamp(cell_num=3, stmt_num=0)

To see dependencies and dependents of a particular symbol, use the deps and users fuctions, respectively:

from ipyflow import deps, users

deps(y)
# [<x>]

users(x)
# [<y>]

If you want to elevate a symbol to the representation used internally by IPyflow, use the lift function (at your own risk, of course):

from ipyflow import lift

y_sym = lift(y)
y_sym.timestamp
# Timestamp(cell_num=3, stmt_num=0)

Colab, VSCode, and other Interfaces

Reactivity and other frontend features are not yet working in interfaces like Colab or VSCode, but you can still use IPyflow's dataflow API on these surfaces by initializing your notebook session with the following code:

%pip install ipyflow
%load_ext ipyflow

Citing

IPyflow started its life under the name nbsafety, which provided the initial suggestions and slicing functionality.

For the execution suggestions:

@article{macke2021fine,
  title={Fine-grained lineage for safer notebook interactions},
  author={Macke, Stephen and Gong, Hongpu and Lee, Doris Jung-Lin and Head, Andrew and Xin, Doris and Parameswaran, Aditya},
  journal={Proceedings of the VLDB Endowment},
  volume={14},
  number={6},
  pages={1093--1101},
  year={2021},
  publisher={VLDB Endowment}
}

For the dynamic slicer (used for reactivity and for the code function, for example):

@article{shankar2022bolt,
  title={Bolt-on, Compact, and Rapid Program Slicing for Notebooks},
  author={Shankar, Shreya and Macke, Stephen and Chasins, Andrew and Head, Andrew and Parameswaran, Aditya},
  journal={Proceedings of the VLDB Endowment},
  volume={15},
  number={13},
  pages={4038--4047},
  year={2022},
  publisher={VLDB Endowment}
}

For anything not covered in the above papers, you can cite the IPyflow repo:

@misc{ipyflow,
  title = {{IPyflow: A Next-Generation, Dataflow-Aware IPython Kernel}},
  howpublished = {\url{https://github.com/ipyflow/ipyflow}},
  year = {2022},
}

Acknowledgements

IPyflow would not have been possible without the amazing academic collaborators listed on the above papers. Its reactive execution features are inspired by those of other excellent tools like Hex notebooks, Pluto.jl, and Observable. IPyflow also enjoys cross-pollination of ideas with other reactive Python notebooks like Marimo, Jolin.io, and Datalore --- definitely check them out as well if you like IPyflow.

Work on IPyflow has benefited from the support of folks from a number of companies -- both in the form of direct financial contributions (Databricks, Hex) as well as indirect moral support and encouragement (Ponder, Meta). And of course, IPyflow rests on the foundations built by the incredible Jupyter community.

License

Code in this project licensed under the BSD-3-Clause License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipyflow-core-0.0.198.tar.gz (161.3 kB view hashes)

Uploaded source

Built Distribution

ipyflow_core-0.0.198-py2.py3-none-any.whl (153.2 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page