Skip to main content

A from-scratch, modular discrete Bayesian-network library.

Project description

probability-flow

A small discrete Bayesian-network library written from scratch in NumPy, with no external inference dependency. The goal is full control over and understanding of the inference stack rather than a black box: every distribution and every solver is plain, readable code that can be checked against a brute-force reference.

On top of that core it carries an argumentation layer (ASPIC) that compiles typed argument graphs into networks, a domain-free metrics seam that measures a compiled network, and a generator that samples random argument graphs with controllable difficulty — the substrate for a benchmark of debate scenarios with known ground truth. Two further capabilities are opt-in extras: matplotlib renderers ([viz]) and JAX posterior calibration and sensitivities ([jax]).

The whole stack is layered and decoupled: core/ knows nothing about arguments, metrics/ knows nothing about the domain, and the optional extras pull in nothing unless you ask for them.

Installing

pip install probability-flow                 # core: NumPy only
pip install 'probability-flow[viz]'          # + matplotlib renderers
pip install 'probability-flow[jax]'          # + JAX calibration / sensitivities
pip install 'probability-flow[viz,jax]'      # both

Python 3.11 or newer. The import name is probability_flow; the base install depends only on NumPy — matplotlib and JAX are optional extras, pulled in only by [viz] and [jax]. Calling a renderer or the calibration module without its extra raises an ImportError telling you which one to install. [dev] adds pytest and ruff.

From a checkout, the package also runs in place with no install step: the importable package is the inner probability_flow/ directory, so working from the repository root (which pytest puts on the path) makes import probability_flow resolve. Run the tests from this directory with pytest. The example notebooks add the path themselves, so they run as-is. CI runs ruff and the test suite on Python 3.11–3.13 on every push.

Quickstart

from probability_flow import Node, ExactSolver, LoopySolver

guilty = Node("Guilty")                        # the target; prior 0.5 by default
guilty.add_input(Node("Seen at scene"), lr=9)  # supporting evidence
guilty.add_input(Node("Alibi"), lr=0.25)       # evidence against (lr < 1)

bn = guilty.compile()
ExactSolver(bn).prob(guilty, 1)   # 0.5731, by enumerating the joint
LoopySolver(bn).prob(guilty, 1)   # 0.5731, by message passing

Build, compile, solve

Inference happens in three stages.

  1. Build. You assemble a mutable graph of Node objects. add_input wires one node as an input of another and returns the input, so upstream structure can be built inline. It takes a single node or a list, and a per-edge weight that is forwarded to the node's distribution.
  2. Compile. target.compile() walks the graph from the target, checks it (acyclic, consistent shapes, one distribution per node), assigns a topological order, and freezes everything into an immutable BayesianNetwork. Authoring is ergonomic; the compiled form is vectorized.
  3. Solve. A solver answers marginal(node) and prob(node, state) queries on the compiled network, optionally conditioned on evidence.

An edge Y -> X declares only that X's distribution is P(X | Y). It is a conditioning relationship, not a claim about causation. A causal model points its edges with causation (cause to effect); an argument graph points them against it (evidence to claim, since the claim is what produces the evidence). The library does not care which: it only ever works with P(node | inputs).

Distributions

Each node carries a conditional probability distribution (CPD) describing P(node | its inputs). The library distinguishes the combination rule (the combiner) from the object that implements it (the CPD).

  • IndependentEvidenceCPD (the default). Treats inputs as uncorrelated sources of evidence: logit P(node=1) = logit(prior) + sum of log(lr) over the active inputs. Adding weights of evidence is Bayes' rule for independent likelihood ratios. Set per edge with add_input(x, lr=...).
  • CorrelatedEvidenceCPD. Independent evidence plus pairwise couplings, for redundant inputs the additive rule would otherwise double-count (two reports of one fact, two clues from a shared cause): it adds a pairwise term + sum J_ij s_i s_j (a negative J makes two inputs sub-additive when both fire, a positive one synergistic). The coupling lives inside the single CPD factor, so it adds no edge to the graph and no loop to the solver, and stays a valid distribution for any real J.
  • NoisyOrCPD. "Any one cause can fire the effect": P(node=0) = (1 - leak) * product of (1 - activation) over present causes. Declared with node.noisy_or(leak=...) and add_input(cause, activation=...).
  • NoisyAndCPD. A native conjunction: the effect needs every condition present, each succeeding with its own activation, with leak acting as a global inhibitor. Declared with node.noisy_and(leak=...).
  • TabularCPD. An arbitrary dense table, the fallback and the reference behind every other distribution. Set with node.set_cpd(TabularCPD(...)). Its repr prints the full table as a readable bordered grid.

IndependentEvidence, NoisyOr, and NoisyAnd all belong to the same family: each input contributes independently, and they differ only in how the contributions combine.

Inference

  • ExactSolver. Brute-force enumeration of the joint distribution. Obviously correct, exponential in the number of nodes, and intended for small graphs and as the ground truth that everything else is tested against. Supports evidence, so colliders and explaining-away can be exercised.
  • LoopySolver. Loopy belief propagation. It treats each distribution as a factor and passes messages on the factor graph. It is exact on loop-free graphs (it converges to the true marginals) and approximate on graphs with loops, where it iterates to a fixed point. It has the same query surface as the exact solver, plus a bulk marginals(), optional damping to settle oscillation, and per-evidence caching.

Why it stays small

A node with k binary inputs has a table of size 2^k. The design keeps that cost off the production path:

  • as_tabular() materializes a distribution's full table. It is used only by the exact solver and the tests, on deliberately small graphs, never during normal inference.
  • LoopySolver talks to a distribution only through two message methods (message_to_output and message_to_input). The engine never inspects a distribution's type.

This means a structured distribution computes its messages in time linear in the number of inputs, without ever building the 2^k table, and the engine does not change. These linear-time messages are implemented: exact for the gates (NoisyOr / NoisyAnd), and a hybrid for the default IndependentEvidence — exact below a fan-in threshold, a capped-bucket approximation above it (within 1e-3 of the exact solver). An arbitrary TabularCPD still falls back to the dense default, which is correct and unavoidable. See docs/fast_messages.md.

The ASPIC argument layer

probability_flow.aspic builds a typed argument out of premises and conclusions joined by support / rebut / undermine / undercut / strict edges, then compiles it to an ordinary BayesianNetwork. All argumentation vocabulary lives here; core/ stays free of it (ASPIC is the first of several planned domain wrappers — legal, medical, AI-safety — over the same core). Every argumentative edge is a method on its downstream node, so the whole graph traverses from the root target.

from probability_flow.aspic import Premise, Conclusion

guilty = Conclusion("Guilty", prior=0.5)
seen = guilty.support(Premise("Seen at scene", 0.5), lr=9)      # supporting argument
guilty.rebut(Premise("Alibi", 0.5), lr=0.25)                   # attacking argument (lr < 1)
guilty.undercut(seen, by=Premise("Unreliable witness", 0.5))   # attack the support *edge*

bn = guilty.compile()        # an ordinary BayesianNetwork to solve as above
arg = guilty.assemble()      # a serializable Argument handle
arg.posterior(guilty)        # 0.4615
arg.save("guilty.json")      # to_json / from_json / save / load — round-trips to identical posteriors

assemble() returns an Argument handle that owns the cached compile, the posterior queries, JSON serialization (typed at the argument level), and the metric methods below. See docs/aspic.md.

Metrics

probability_flow.metrics measures a compiled network and depends only on core — no argumentation vocabulary, so every wrapper reuses it. Every function is pure and accepts either a compiled BayesianNetwork or a target node (compiled on the spot), and results carry your own Node objects, never ids.

from probability_flow import metrics

metrics.d_separated_groups(bn, guilty)   # independent evidence branches
metrics.upstream_size(guilty)            # size of the evidence base (auto-compiles)
metrics.circuit_rank(bn)                 # distance from a tree (0 == polytree)
metrics.posterior_range(bn, guilty)      # manipulability: posterior range over prunings
bn.max_depth(guilty)                     # trivial structural ones are also BN methods

The seam covers structure (depth, size), loopiness (circuit rank, polytree test), difficulty (posterior gap, log-odds deltas, concentration Gini), and manipulability (posterior range). See docs/metrics.md.

Generating argument graphs

probability_flow.aspic.generate rejection-samples ASPIC arguments that meet a difficulty target — random support/attack chains, optional undercutters, axiomatic leaves, strict edges, and parent sharing, screened against the real metrics layer. This is the substrate for a benchmark of debate scenarios with known ground truth.

from probability_flow.aspic import generate, StructuralParams, DifficultyTargets

arg = generate(
    seed=0,
    structural=StructuralParams(n_support=2, n_attack=1),
    targets=DifficultyTargets(target_posterior=0.7),   # reached by calibrating the root prior
)

The structural shape is set directly; difficulty targets are screened and the target posterior is hit by calibrating the root prior. See docs/generation.md.

Visualization (optional, [viz])

With the [viz] extra installed, a compiled network and an argument both render to a matplotlib figure with an in-house layout — radial by default (the root at the centre, the d-separated evidence branches fanning out under a force model), or layout="layered" for the root-at-the-bottom tree. Likelihood-ratio edges are coloured red→blue by their LR, and a long node name shrinks its font to fit:

bn.render()                       # radial by default
bn.render(layout="layered")       # root-at-bottom tree (orientation="horizontal" too)
guilty.assemble().render()        # the argument view (same layouts)

matplotlib is imported lazily, only when you draw, so importing probability_flow never pulls it in; without the extra these calls raise an ImportError pointing at pip install 'probability-flow[viz]'.

Calibration and sensitivities (optional, [jax])

With the [jax] extra, probability_flow.aspic.calibrate differentiates the root posterior through the compiled argument: sensitivities(arg) gives a per-parameter importance signal (via jax.grad), and calibrate_posterior(arg, target) solves for parameters that drive the root posterior to a chosen value. JAX is imported lazily inside these calls, so without the extra they raise an ImportError pointing at pip install 'probability-flow[jax]'.

Layout

probability_flow/     the importable package
  core/
    node.py          authoring layer (Node, add_input, compile)
    network.py       the compiled, immutable network
    exact.py         ExactSolver
    cpd/             the distributions (base contract + the four CPDs)
    bp/              loopy belief propagation (LoopySolver)
    _logmath.py      log-space numerics
  aspic/             the ASPIC argument layer: authoring + compilation,
                     serialization, a random argument generator, and optional
                     JAX posterior calibration
  metrics/           graph metrics (d-separation, depth/size, loopiness,
                     difficulty, manipulability)
  visualization/     matplotlib renderers (arg.render() / bn.render()), [viz]
docs/                design notes (SPEC, DECISIONS, ROADMAP, and more)
tests/               checked against the exact solver
demos/               walkthrough notebooks

Status and next steps

Working today: the build/compile flow, all four distributions, both solvers with linear-time structured messages, evidence, the ASPIC argument-compilation layer, argument serialization, the metrics seam (d-separation grouping, depth/size, loopiness, difficulty, manipulability), a random argument generator with structural and difficulty controls, optional matplotlib renderers (radial and layered layouts), and optional JAX-based posterior calibration and parameter sensitivities. Loopy BP, the metrics seam, and the renderer are validated against the exact solver on non-tree (shared-parent) graphs by the "topology zoo" harness (tools/topology_zoo.py, docs/topology_zoo.md). Planned (see docs/ROADMAP.md): a core-network serializer and the exact manipulability range.

Learning more

The notebooks in demos/ walk through the library end to end: walkthrough.ipynb for the core, noisy_gates_walkthrough.ipynb for the gate distributions, loopy_bp_walkthrough.ipynb for belief propagation, visualization_demo.ipynb for the renderers, metrics_walkthrough.ipynb for the metrics seam, and generation_demo.ipynb for the argument generator. The docs/ directory holds the design rationale, the settled decisions, and the roadmap.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probability_flow-0.3.0.tar.gz (70.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

probability_flow-0.3.0-py3-none-any.whl (91.1 kB view details)

Uploaded Python 3

File details

Details for the file probability_flow-0.3.0.tar.gz.

File metadata

  • Download URL: probability_flow-0.3.0.tar.gz
  • Upload date:
  • Size: 70.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for probability_flow-0.3.0.tar.gz
Algorithm Hash digest
SHA256 86d0edfa60395ccc4b2ac363b55372dc506dd2f7daad19244e1ae6e69e29a83f
MD5 d02071d5f3dbad2b4dbce6a7fd3f419f
BLAKE2b-256 c316eb9f21ba65da1e0d85c58d1a154743c177269835a9e6e5842bc71e9f7a7d

See more details on using hashes here.

File details

Details for the file probability_flow-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for probability_flow-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 214d21cb557d9cbbb41d8979e4ec2a2302013474a2a0e7fed7aa4ddc13508279
MD5 de3e20a3d45ac9b88e4c9c8f18bc5e45
BLAKE2b-256 4ea5b51b000f12f7631656ae25bc6c65211b41bdaf0cf251977006ccd7fb59ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page