Build reactive networks using Cells and Pypes.
Project description
cellopype: a reactive pipeline of executable cells
Introduction
With the cellopype Cell
and Pype
classes, you can build a network of interconnected DataFrames, each wrapped in its own cell. Each cell has its own 'construction rules' and an update to any cell value is cascaded through the rest of the network.
Think of it as a Jupyter notebook with the cells being automatically (re-)run as required by changes at other points in the notebook. (And without the UI, obviously.)
In abstract terms, a cell consists of a custom recalc
function that takes a number of inputs (sources
), and recalculates the cell's value
from the values of these sources. Any change in value for a given cell is propagated to any other cells depending on it. The network of cells must be one-directional, without any loops (i.e., it has to be a DAG). Lazy computation is default, but subscription-type 'push on change' is supported.
Practically all the work is done by the Cell class, which is relatively small (approx. 50-60 lines). Pype
is a utility class that adds name spacing logic and convenience methods to a collection of Cells. See below.
An example with three Cells
import pandas as pd
from cellopype import Cell
# example from https://stackoverflow.com/questions/62671185
dfA = pd.DataFrame([1, 2, 3], columns=["value"])
dfB = pd.DataFrame([10, 20, 30], columns=["value"])
cell_a = Cell(recalc=lambda: dfA.copy())
cell_b = Cell(recalc=lambda: dfB.copy())
This is a bit contrived; cell_a and cell_b are 'recalculated' by returning a copy of an external dataframe. We make a copy in order to return a reference to a fresh object instance. Don't have your recalc functions poke around in (or return references to) existing compound objects!
Let's define a recalc function for cell_c and create it:
def plus(a, b):
return a + b
cell_c = Cell(recalc=plus, sources=[cell_a, cell_b]) # or: recalc=lambda a,b: a+b, sources=[...]
Check that cell_a and cell_c are initialized: they are _dirty
, with no cached _value
yet:
print('cell_a:', cell_a._dirty, cell_a._value)
cell_a: True None
print('cell_c:', cell_c._dirty, cell_c._value)
cell_c: True None
Now comes the nice part. Reading cell_c.value
triggers recalc of cell_c, which reads cell_a & cell_b values, which in turn triggers recalc of cell_a & cell_b:
print(cell_c.value)
value
0 11
1 22
2 33
Proof of the pudding, change source value for cell_a and recalculate:
dfA.loc[0, "value"] = 222
cell_a.recalc()
Reading the value for cell_c again now triggers recalc across the pipeline.
The row 0 value of cell_c reflects the change in row 0 of cell_a:
print(cell_c.value)
value
0 232
1 22
2 33
Cell API: summary & example
from cellopype import Cell
Every cell must have a recalc function that defines how the cell value is (re)calculated from its sources:
def recalc_fn( a, b ):
"""Source.values are passed in, returns the new value for this cell"""
# the values of all its source cells are passed as args to this function
# args are positional, names do not have to match source names
return a.add(b)
Given the definitions above, we can call the Cell() constructor:
my_cell = Cell(
sources[cell_a, cell_b], # specify the source cells for this cell's recalc
recalc = recalc_fn, # calculate new cell.value (source.values as args) [1]
lazy = True, # default=True: recalculate only when necessary [2]
on_change = plot_it # optional, called whenever cell 'value' changes [3]
)
-
Our
recalc
function here is pretty trivial. Alternatively, we could simply pass inrecalc=pd.DataFrame.add
(When you pass a class method as recalc function, the first argument is taken as the instance, i.e., self.) \ -
If
lazy
=True: the cell's value property is recalculated when:
(1) cell.recalc() is called or
(2) cell.value is read (by your code or by another cell's recalc) and there is no valid cached _value.
If lazy=False, the cell is recalculated immediately when invalidated. -
on_change
is called when the cell value changes (comparable to 'subscribe' in RX). The new value is passed as its single argument; no return value is expected. If an on_change handler is supplied, lazy recalculation is disabled.
The cell's value property is a getter/setter combo that reads and updates the internal cached _value
. It can trigger recalc and/or invalidate other cells. A cell value can be a DataFrame, Series or scalar -- but most Python types should work, including lists and dictionaries.
If you want to force recalc for a cell at any time, you can call my_cell.recalc()
This will only make sense for cells defined with lazy=True
. In general, you should not have to: reading the output cell values you need should trigger recalc across the network.
Finally, a subtle detail: in sources
you specify the cell instances that the new cell depends upon (i.e. cell_a
). The recalc
function gets passed the values of these cells (i.e., cell_a.value
). This means your recalc function has no knowledge of or access to the cell. It deals directly with DataFrames or other variables, without having to unwrap them.
Pype: API summary & example
Pype is a helper class to make it easier to manage a network of Cells. A Pype simulates a dictionary with the cell name as key (and with dot-name access to the attributes). Let's initialize a pype and add some cells to it.
from cellopype import Cell, Pype
pp = Pype()
pp.cell_a = Cell(recalc=lambda: [1,2,3])
pp.cell_b = Cell(recalc=lambda: [4,5,6])
pp.cell_c = Cell(recalc=lambda a, b: a+b, sources=[pp.cell_a, pp.cell_b])
# note that the reference to the source cell instances includes the pype container,
# allowing references to cells in other pypes (or outside of any pype)
pp.keys():
dict_keys(['cell_a', 'cell_b', 'cell_c'])
print(pp.cell_a)
<cellopype.cell.Cell at 0x7fb3100976d0>
print(pp.cell_c.value)
[1, 2, 3, 4, 5, 6]
The cell name is also added back to the cell:
print(pp.cell_a.name)
'cell_a' # handy for debugging if a cell can identity itself back to you
Pype.cells gets all cells in the Pype into a name-based dictionary:
print(pp.cells)
{ 'cell_a': <cellopype.cell.Cell object at 0x7fb3100976d0>,
'cell_b': <cellopype.cell.Cell object at 0x7fb310097790>,
'cell_c': <cellopype.cell.Cell object at 0x7fb310097910>}
Pype.recalc_all() forces recalculation of all cells in the Pype.
Pype.dump_values() dumps all cell values to a list of dicts while Pype.load_values(pld) restores them:
# recalculate and dump all cell values to a list of dicts
pld = pp.dump_values()
print(pld)
[ {'name': 'cell_a', 'value': [1, 2, 3]},
{'name': 'cell_b', 'value': [4, 5, 6]},
{'name': 'cell_c', 'value': [1, 2, 3, 4, 5, 6]}]
# ... mess with pp contents
pp.load_values(pld)
# pp values are restored
Note these functions only dump and load the values. If you want to dump & restore the entire pype, including recalc logic & handlers, use dill to pickle the Pype itself:
import dill # not pickle, because lambdas
with open('pype_p.dill', 'wb') as file:
dill.dump(pp, file)
with open('pype_p.dill', 'rb') as inp:
pp2 = dill.load(inp)
# but heed the warnings: this is not secure unless the dill file is secured;
# and not suitable for structural persistence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cellopype-0.1.1.tar.gz
.
File metadata
- Download URL: cellopype-0.1.1.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d987a40b253179c1cb0f100708a97d42648e6f38d10070cc5325ffff1f71343 |
|
MD5 | 321bac3780cd8efa55511777b3dd22ac |
|
BLAKE2b-256 | f2dcc3751ff8c3078b60bc61d946b150681cef94c600f7707eddb54e1c4ccb44 |
File details
Details for the file cellopype-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: cellopype-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7fcd997e3f59b6ccbc6ec495879ae5a904bbc46a0f2cb32c638987af01ce2bf |
|
MD5 | 537589847aa5486d1bf943f9a202a8c3 |
|
BLAKE2b-256 | 32aa12cc57dfc1cb203fc6e72f598295410c38050a685a0cb633f0b7aded65cd |