A from-scratch, modular discrete Bayesian-network library.
Project description
probability-flow
A small discrete Bayesian-network library written from scratch in NumPy, with no external inference dependency. The goal is full control over and understanding of the inference stack rather than a black box: every distribution and every solver is plain, readable code that can be checked against a brute-force reference.
On top of that core it carries an argumentation layer (ASPIC) that compiles
typed argument graphs into networks, a domain-free metrics seam that measures
a compiled network, and a generator that samples random argument graphs with
controllable difficulty — the substrate for a benchmark of debate scenarios with
known ground truth. Two further capabilities are opt-in extras: matplotlib
renderers ([viz]) and JAX posterior calibration and sensitivities
([jax]).
The whole stack is layered and decoupled: core/ knows nothing about arguments,
metrics/ knows nothing about the domain, and the optional extras pull in nothing
unless you ask for them.
Installing
pip install probability-flow # core: NumPy only
pip install 'probability-flow[viz]' # + matplotlib renderers
pip install 'probability-flow[jax]' # + JAX calibration / sensitivities
pip install 'probability-flow[viz,jax]' # both
Python 3.11 or newer. The import name is probability_flow; the base install
depends only on NumPy — matplotlib and JAX are optional extras, pulled in only
by [viz] and [jax]. Calling a renderer or the calibration module without its
extra raises an ImportError telling you which one to install. [dev] adds pytest
and ruff.
From a checkout, the package also runs in place with no install step: the
importable package is the inner probability_flow/ directory, so working from the
repository root (which pytest puts on the path) makes import probability_flow
resolve. Run the tests from this directory with pytest. The example notebooks add
the path themselves, so they run as-is. CI runs ruff and the test suite on Python
3.11–3.13 on every push.
Quickstart
from probability_flow import Node, ExactSolver, LoopySolver
guilty = Node("Guilty") # the target; prior 0.5 by default
guilty.add_input(Node("Seen at scene"), lr=9) # supporting evidence
guilty.add_input(Node("Alibi"), lr=0.25) # evidence against (lr < 1)
bn = guilty.compile()
ExactSolver(bn).prob(guilty, 1) # 0.5731, by enumerating the joint
LoopySolver(bn).prob(guilty, 1) # 0.5731, by message passing
Build, compile, solve
Inference happens in three stages.
- Build. You assemble a mutable graph of
Nodeobjects.add_inputwires one node as an input of another and returns the input, so upstream structure can be built inline. It takes a single node or a list, and a per-edge weight that is forwarded to the node's distribution. - Compile.
target.compile()walks the graph from the target, checks it (acyclic, consistent shapes, one distribution per node), assigns a topological order, and freezes everything into an immutableBayesianNetwork. Authoring is ergonomic; the compiled form is vectorized. - Solve. A solver answers
marginal(node)andprob(node, state)queries on the compiled network, optionally conditioned onevidence.
An edge Y -> X declares only that X's distribution is P(X | Y). It is a
conditioning relationship, not a claim about causation. A causal model points its
edges with causation (cause to effect); an argument graph points them against it
(evidence to claim, since the claim is what produces the evidence). The library
does not care which: it only ever works with P(node | inputs).
Distributions
Each node carries a conditional probability distribution (CPD) describing
P(node | its inputs). The library distinguishes the combination rule (the
combiner) from the object that implements it (the CPD).
- IndependentEvidenceCPD (the default). Treats inputs as uncorrelated
sources of evidence:
logit P(node=1) = logit(prior) + sum of log(lr)over the active inputs. Adding weights of evidence is Bayes' rule for independent likelihood ratios. Set per edge withadd_input(x, lr=...). - NoisyOrCPD. "Any one cause can fire the effect":
P(node=0) = (1 - leak) * product of (1 - activation)over present causes. Declared withnode.noisy_or(leak=...)andadd_input(cause, activation=...). - NoisyAndCPD. A native conjunction: the effect needs every condition
present, each succeeding with its own
activation, withleakacting as a global inhibitor. Declared withnode.noisy_and(leak=...). - TabularCPD. An arbitrary dense table, the fallback and the reference behind
every other distribution. Set with
node.set_cpd(TabularCPD(...)). Itsreprprints the full table as a readable bordered grid.
IndependentEvidence, NoisyOr, and NoisyAnd all belong to the same family:
each input contributes independently, and they differ only in how the
contributions combine.
Inference
- ExactSolver. Brute-force enumeration of the joint distribution. Obviously correct, exponential in the number of nodes, and intended for small graphs and as the ground truth that everything else is tested against. Supports evidence, so colliders and explaining-away can be exercised.
- LoopySolver. Loopy belief propagation. It treats each distribution as a
factor and passes messages on the factor graph. It is exact on loop-free graphs
(it converges to the true marginals) and approximate on graphs with loops,
where it iterates to a fixed point. It has the same query surface as the exact
solver, plus a bulk
marginals(), optionaldampingto settle oscillation, and per-evidence caching.
Why it stays small
A node with k binary inputs has a table of size 2^k. The design keeps that cost off the production path:
as_tabular()materializes a distribution's full table. It is used only by the exact solver and the tests, on deliberately small graphs, never during normal inference.LoopySolvertalks to a distribution only through two message methods (message_to_outputandmessage_to_input). The engine never inspects a distribution's type.
This means a structured distribution computes its messages in time linear in the
number of inputs, without ever building the 2^k table, and the engine does not
change. These linear-time messages are implemented: exact for the gates
(NoisyOr / NoisyAnd), and a hybrid for the default IndependentEvidence —
exact below a fan-in threshold, a capped-bucket approximation above it (within
1e-3 of the exact solver). An arbitrary
TabularCPD still falls back to the dense default, which is correct and
unavoidable. See docs/fast_messages.md.
The ASPIC argument layer
probability_flow.aspic builds a typed argument out of premises and conclusions
joined by support / rebut / undermine / undercut / strict edges, then compiles
it to an ordinary BayesianNetwork. All argumentation vocabulary lives here;
core/ stays free of it (ASPIC is the first of several planned domain wrappers —
legal, medical, AI-safety — over the same core). Every argumentative edge is a
method on its downstream node, so the whole graph traverses from the root target.
from probability_flow.aspic import Premise, Conclusion
guilty = Conclusion("Guilty", prior=0.5)
seen = guilty.support(Premise("Seen at scene", 0.5), lr=9) # supporting argument
guilty.rebut(Premise("Alibi", 0.5), lr=0.25) # attacking argument (lr < 1)
guilty.undercut(seen, by=Premise("Unreliable witness", 0.5)) # attack the support *edge*
bn = guilty.compile() # an ordinary BayesianNetwork to solve as above
arg = guilty.assemble() # a serializable Argument handle
arg.posterior(guilty) # 0.4615
arg.save("guilty.json") # to_json / from_json / save / load — round-trips to identical posteriors
assemble() returns an Argument handle that owns the cached compile, the
posterior queries, JSON serialization (typed at the argument level), and the
metric methods below. See docs/aspic.md.
Metrics
probability_flow.metrics measures a compiled network and depends only on
core — no argumentation vocabulary, so every wrapper reuses it. Every function
is pure and accepts either a compiled BayesianNetwork or a target node (compiled
on the spot), and results carry your own Node objects, never ids.
from probability_flow import metrics
metrics.d_separated_groups(bn, guilty) # independent evidence branches
metrics.upstream_size(guilty) # size of the evidence base (auto-compiles)
metrics.circuit_rank(bn) # distance from a tree (0 == polytree)
metrics.posterior_range(bn, guilty) # manipulability: posterior range over prunings
bn.max_depth(guilty) # trivial structural ones are also BN methods
The seam covers structure (depth, size), loopiness (circuit rank, polytree test),
difficulty (posterior gap, log-odds deltas, concentration Gini), and
manipulability (posterior range). See docs/metrics.md.
Generating argument graphs
probability_flow.aspic.generate rejection-samples ASPIC arguments that meet a
difficulty target — random support/attack chains, optional undercutters, axiomatic
leaves, strict edges, and parent sharing, screened against the real metrics layer.
This is the substrate for a benchmark of debate scenarios with known ground truth.
from probability_flow.aspic import generate, StructuralParams, DifficultyTargets
arg = generate(
seed=0,
structural=StructuralParams(n_support=2, n_attack=1),
targets=DifficultyTargets(target_posterior=0.7), # reached by calibrating the root prior
)
The structural shape is set directly; difficulty targets are screened and the
target posterior is hit by calibrating the root prior. See docs/generation.md.
Visualization (optional, [viz])
With the [viz] extra installed, a compiled network and an argument both render to
a matplotlib figure with an in-house layout — radial by default (the root at the
centre, the d-separated evidence branches fanning out under a force model), or
layout="layered" for the root-at-the-bottom tree. Likelihood-ratio edges are
coloured red→blue by their LR, and a long node name shrinks its font to fit:
bn.render() # radial by default
bn.render(layout="layered") # root-at-bottom tree (orientation="horizontal" too)
guilty.assemble().render() # the argument view (same layouts)
matplotlib is imported lazily, only when you draw, so importing probability_flow
never pulls it in; without the extra these calls raise an ImportError pointing at
pip install 'probability-flow[viz]'.
Calibration and sensitivities (optional, [jax])
With the [jax] extra, probability_flow.aspic.calibrate differentiates the root
posterior through the compiled argument: sensitivities(arg) gives a
per-parameter importance signal (via jax.grad), and calibrate_posterior(arg, target) solves for parameters that drive the root posterior to a chosen value.
JAX is imported lazily inside these calls, so without the extra they raise an
ImportError pointing at pip install 'probability-flow[jax]'.
Layout
probability_flow/ the importable package
core/
node.py authoring layer (Node, add_input, compile)
network.py the compiled, immutable network
exact.py ExactSolver
cpd/ the distributions (base contract + the four CPDs)
bp/ loopy belief propagation (LoopySolver)
_logmath.py log-space numerics
aspic/ the ASPIC argument layer: authoring + compilation,
serialization, a random argument generator, and optional
JAX posterior calibration
metrics/ graph metrics (d-separation, depth/size, loopiness,
difficulty, manipulability)
visualization/ matplotlib renderers (arg.render() / bn.render()), [viz]
docs/ design notes (SPEC, DECISIONS, ROADMAP, and more)
tests/ checked against the exact solver
demos/ walkthrough notebooks
Status and next steps
Working today: the build/compile flow, all four distributions, both solvers with
linear-time structured messages, evidence, the ASPIC argument-compilation layer,
argument serialization, the metrics seam (d-separation grouping, depth/size,
loopiness, difficulty, manipulability), a random argument generator with structural
and difficulty controls, optional matplotlib renderers (radial and layered
layouts), and optional JAX-based posterior calibration and parameter sensitivities.
Loopy BP, the metrics seam, and the renderer are validated against the exact solver
on non-tree (shared-parent) graphs by the "topology zoo" harness
(tools/topology_zoo.py, docs/topology_zoo.md). Planned (see docs/ROADMAP.md):
a core-network serializer and the exact manipulability range.
Learning more
The notebooks in demos/ walk through the library end to end:
walkthrough.ipynb for the core, noisy_gates_walkthrough.ipynb for the gate
distributions, loopy_bp_walkthrough.ipynb for belief propagation,
visualization_demo.ipynb for the renderers, metrics_walkthrough.ipynb for
the metrics seam, and generation_demo.ipynb for the argument generator. The
docs/ directory holds the design rationale, the settled decisions, and the
roadmap.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file probability_flow-0.2.0.tar.gz.
File metadata
- Download URL: probability_flow-0.2.0.tar.gz
- Upload date:
- Size: 57.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c4e462aff95dc7732da69dfcbcb45911cf97c3b9b7c9d2f5283d51840db1527
|
|
| MD5 |
7a402ba31a8355eb3e9c9ff753bbfdb6
|
|
| BLAKE2b-256 |
8d23d7ec2614fbd659f3692aa357b9afe6504519db2af90fc9e4aa740a5c744a
|
File details
Details for the file probability_flow-0.2.0-py3-none-any.whl.
File metadata
- Download URL: probability_flow-0.2.0-py3-none-any.whl
- Upload date:
- Size: 75.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8b2dd701eb37149871f9b0fa9aa60728333f1dc2e4ba51729e47f50d1f58d8c
|
|
| MD5 |
779a8616135be8728bb9afa1085794c2
|
|
| BLAKE2b-256 |
afcf2df327424d47f097ce44607a9a04ae7be546a71381349704de2ead920581
|