Skip to main content

A primitive for online sequential aggregation in Python

Project description

livefold

live + fold — incremental folds over a mutable sequence. ("live" as in "live broadcast" — rhymes with "five", not "give".)

Build Status Supported Versions license

A primitive for online sequential aggregation in Python. Maintain a mutable sequence; query exact aggregates over any range in O(√n); plug in any associative reducer (any monoid).

When to reach for it

Need Use
Immutable series, range aggregates Prefix sums
Frequent point updates, log-time queries Segment tree / Fenwick tree
Fixed-width rolling windows pandas.rolling() / polars.rolling()
Mutable series, arbitrary range queries, multi-fold livefold

Common shapes: request latencies, sensor readings, trade prices, telemetry events.

Quickstart

pip install livefold
from livefold import LiveFold

lf = LiveFold([1, 2, 3, 4, 5, 6], folds={"sum": sum, "max": max, "min": min})

lf.append(7)
lf.query(0, 5)
# → {"sum": 21, "max": 6, "min": 1}

# Mutate freely; aggregates stay current
lf[2] = -1
lf.query(2, 5)
# → {"sum": 9, "max": 6, "min": -1}

Performance

Query latency vs collection size

At n = 10⁷, livefold's median range query is 69 µs vs naive Python's 29 ms (~400× faster), and median append cost is ~2 µs across all n while every other backend with a competitive query path (numpy, pandas) degrades linearly on every append. livefold's amortized append is O(√n) — boundary-crossing rebuilds are rare (~one per √n appends) but real — so the median characterizes typical streaming latency, not the rare rebuild spike.

Full methodology, append benchmarks, comparison against four backends, and the reproduction script: benchmarks/.

Examples

Two runnable Streamlit demos in examples/:

  • system_metrics/ — live psutil-driven CPU/memory dashboard with arbitrary-range aggregate queries. Runs entirely offline.
  • crypto_ticks/ — synthetic BTC/USD tick stream with high/low/avg-price queries. Includes a drop-in recipe for real Binance ticks.

API

LiveFold(data: Iterable, folds: dict[str, Callable])
Member Returns Notes
lf.append(x) None Amortized O(√n); worst-case O(n) on perfect-square crossings
lf.query(left, right) dict[str, Any] O(√n); inclusive bounds
lf.blocks list[list] Underlying √n-sized blocks
lf.folded_values dict[str, list] Per-fold, per-block aggregates
lf.insert / pop / extend / remove / sort / ... Standard list methods; blocks and folds updated in place

LiveFold subclasses list, so it's a drop-in for any code that expected a plain list — until you start calling query.

Time-indexed queries: TimeIndexedLiveFold

For monotonic streams where you want to query by time instead of index, TimeIndexedLiveFold carries a parallel timestamp per element and exposes query_time_range — still O(√n).

from livefold import TimeIndexedLiveFold

lf = TimeIndexedLiveFold(
    [1, 2, 3],
    folds={"sum": sum, "max": max},
    timestamps=[1.0, 2.0, 3.0],
)
lf.append(4, timestamp=4.0)
lf.append(5, timestamp=5.0)

lf.query_time_range(2.0, 4.0)
# → {"sum": 9, "max": 4}

If you omit timestamps / timestamp, time.time() is used by default. The class is generic over the timestamp type — anything orderable works (float, int, datetime, ...). Pass explicit timestamps for any non-float type:

from datetime import datetime
from livefold import TimeIndexedLiveFold

lf = TimeIndexedLiveFold[datetime](
    [1, 2, 3],
    folds={"sum": sum},
    timestamps=[datetime(2025, 1, 1), datetime(2025, 6, 1), datetime(2026, 1, 1)],
)
lf.query_time_range(datetime(2025, 5, 1), datetime(2026, 6, 1))
# → {"sum": 5}

TimeIndexedLiveFold is append-only by design. Operations that would break time ordering raise MonotonicityError:

  • insert, sort, reverse — would break the ordering invariant
  • slice __setitem__, slice __delitem__ — would desync data and timestamps
  • append / extend with a timestamp earlier than the last stored one
  • + / += with anything other than another TimeIndexedLiveFold (use extend(values, timestamps=...) instead)

Integer indexing (lf[i] = x, del lf[i]), pop, remove, and clear work normally and keep the parallel timestamp list in sync. + and += between two TimeIndexedLiveFold instances concatenate timestamps and re-check monotonicity.

How it works

Query animation

LiveFold splits its underlying list into ⌊√n⌋ blocks of size √n, precomputes the configured folds for each block, and updates them incrementally on mutation. A query(left, right) walks at most two partial blocks plus the precomputed folds for whole-block spans in between — touching roughly 2√n elements per fold regardless of n. Mo's algorithm with mutability and a dict-shaped output.

Append re-block animation

append is amortized O(√n). Most appends just push onto the last block at O(1), but each time n crosses a perfect square, block_size = isqrt(n) increments and the whole structure re-blocks — an O(n) cost. The gap between consecutive perfect squares is 2k + 1 (linear in k = √n, not geometric), so over n appends total rebuild work sums to Σ_{k=1}^{√n} O(k²) = O(n^(3/2)) — i.e. O(√n) per append amortized. Boundary crossings happen only ~once per √n appends, though, so the median append latency stays in the low µs range across all n (this is what the benchmarks show); the asymptotic only shows up in mean or total cost.

TimeIndexedLiveFold layers a parallel monotonically non-decreasing timestamp list on top. query_time_range(start, end) calls bisect_left/bisect_right to map timestamps to indices in O(log n), then routes through the same √n-decomposed query path — so overall query cost stays O(√n).

For the full derivation, complexity analysis, and other implementation details, see the corresponding blog post.

Fold contract

A fold is a single-argument callable fn(items) -> result that:

  1. Accepts an iterable of elements (or, when called internally to combine block results, an iterable of prior fold results) and returns a single value.
  2. Is associative: fn([fn([a, b]), fn([c, d])]) == fn([a, b, c, d]). This lets query combine precomputed block-level folds with the partial folds at each end.
  3. Returns a value of the same shape it accepts as elements — i.e., feeding the result back through fn together with other results must work. len is a common-but-broken choice: it returns an int regardless of input shape, so len([len(block_a), len(block_b)]) gives 2, not the total element count.

Examples that satisfy the contract: sum, max, min, math.prod, math.gcd (via functools.reduce), bitwise or/and/xor, "".join, and any mergeable sketch (t-digest, HyperLogLog, Count-Min, Welford) wrapped in a fold-shaped callable. Commutativity is not required — string concatenation, matrix multiplication, and other ordered monoids work too.

Constraints

  • Not thread-safe. Single-process, single-thread workloads only.

Contributing

See CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livefold-0.2.0.tar.gz (149.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livefold-0.2.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file livefold-0.2.0.tar.gz.

File metadata

  • Download URL: livefold-0.2.0.tar.gz
  • Upload date:
  • Size: 149.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for livefold-0.2.0.tar.gz
Algorithm Hash digest
SHA256 870e7f9fdc2aaac2cce856c6d481895be056fe7e06a3947b6b1b62f44a7b210c
MD5 09df62d023a4926deeff490991ae0639
BLAKE2b-256 e39d0f9178342d029e03ec5234647c4bfd2cf3a355e9d78cb530af72f4183eff

See more details on using hashes here.

File details

Details for the file livefold-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: livefold-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for livefold-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05e610f6d05826022052040e59363ac73d6ff73e78acf7499234dd11a18c37ae
MD5 f2525ea7d956e92708f81257d66f90b7
BLAKE2b-256 06aa0c8f35a19b846503a320e3e1f65d97091f96203f8761f21f32cfa39bc8ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page