Skip to main content

Python Toolkit for Causal and Probabilistic Reasoning

Project description

pgmpy provides the building blocks for causal and probabilistic reasoning using graphical models. It implements data structures for a range of causal and graphical models such as DAGs, PDAGs, MAGs, PAGs, Bayesian Networks, Dynamic Bayesian Networks, and Structural Equation Models, along with algorithms for various tasks such as causal discovery, causal identification, causal and probabilistic inference, model validation, parameter estimation, simulations, and more.

Algorithms for each task follow a unified composable API, making them modular and extensible. They are also scikit-learn compatible when possible. They can be used directly, combined in sklearn pipelines, or used to build higher-level tools on top of them.

Documentation · Examples . Tutorials
Open Source GitHub License
Tutorials Binder
Community Discord Online !slack
CI/CD github-actions codecov asv platform
Code !pypi !conda !python-versions !black
Downloads PyPI - Downloads Downloads
Supported By GC.OS Sponsored FLOSS/FUND Affiliated with NumFOCUS

Key Features

Feature Description
Causal Discovery / Structure Learning Learn the model structure from data, with optional integration of expert knowledge.
Causal Validation Assess how compatible the causal structure is with the data.
Parameter Learning Estimate model parameters (e.g., conditional probability distributions) from observed data.
Probabilistic Inference Compute posterior distributions conditioned on observed evidence.
Causal Inference Compute interventional and counterfactual distributions using do-calculus.
Simulations Generate synthetic data under specified evidence or interventions.
Example Datasets and Models Collection of datasets and models from various sources.
Plotting Flexible plotting functionality.

Resources and Links

Quickstart

Installation

pgmpy is available on both PyPI and anaconda. To install from PyPI, use:

pip install pgmpy

To install from conda-forge, use:

conda install conda-forge::pgmpy

Examples

Discrete Data

from pgmpy.example_models import load_model

# Load a Discrete Bayesian Network and simulate data.
discrete_bn = load_model("bnlearn/alarm")
alarm_df = discrete_bn.simulate(n_samples=100)

# Learn a network from simulated data.
from pgmpy.estimators import PC

dag = PC(data=alarm_df).estimate(ci_test="chi_square", return_type="dag")

# Learn the parameters from the data.
from pgmpy.models import DiscreteBayesianNetwork

discrete_bn = DiscreteBayesianNetwork(dag.edges())
discrete_bn.add_nodes_from(dag.nodes())
dag_fitted = discrete_bn.fit(alarm_df)
dag_fitted.get_cpds()

# Drop a column and predict using the learned model.
evidence_df = alarm_df.drop(columns=["FIO2"], axis=1)
pred_FIO2 = dag_fitted.predict(evidence_df)

Linear Gaussian Data

from pgmpy.example_models import load_model

# Load an example Gaussian Bayesian Network and simulate data
gaussian_bn = load_model("bnlearn/ecoli70")
ecoli_df = gaussian_bn.simulate(n_samples=100)

# Learn the network from simulated data.
from pgmpy.estimators import PC

dag = PC(data=ecoli_df).estimate(ci_test="pearsonr", return_type="dag")

# Learn the parameters from the data.
from pgmpy.models import LinearGaussianBayesianNetwork

gaussian_bn = LinearGaussianBayesianNetwork(dag.edges())
dag_fitted = gaussian_bn.fit(ecoli_df)
dag_fitted.get_cpds()

# Drop a column and predict using the learned model.
evidence_df = ecoli_df.drop(columns=["ftsJ"], axis=1)
pred_ftsJ = dag_fitted.predict(evidence_df)

Mixture Data with Arbitrary Relationships

from pgmpy.global_vars import config

config.set_backend("torch")

import pyro.distributions as dist

from pgmpy.models import FunctionalBayesianNetwork
from pgmpy.factors.hybrid import FunctionalCPD

# Create a Bayesian Network with mixture of discrete and continuous variables.
func_bn = FunctionalBayesianNetwork(
    [
        ("x1", "w"),
        ("x2", "w"),
        ("x1", "y"),
        ("x2", "y"),
        ("w", "y"),
        ("y", "z"),
        ("w", "z"),
        ("y", "c"),
        ("w", "c"),
    ]
)

# Define the Functional CPDs for each node and add them to the model.
cpd_x1 = FunctionalCPD("x1", fn=lambda _: dist.Normal(0.0, 1.0))
cpd_x2 = FunctionalCPD("x2", fn=lambda _: dist.Normal(0.5, 1.2))

# Continuous mediator: w = 0.7*x1 - 0.3*x2 + ε
cpd_w = FunctionalCPD(
    "w",
    fn=lambda parents: dist.Normal(0.7 * parents["x1"] - 0.3 * parents["x2"], 0.5),
    parents=["x1", "x2"],
)

# Bernoulli target with logistic link: y ~ Bernoulli(sigmoid(-0.7 + 1.5*x1 + 0.8*x2 + 1.2*w))
cpd_y = FunctionalCPD(
    "y",
    fn=lambda parents: dist.Bernoulli(
        logits=(-0.7 + 1.5 * parents["x1"] + 0.8 * parents["x2"] + 1.2 * parents["w"])
    ),
    parents=["x1", "x2", "w"],
)

# Downstream Bernoulli influenced by y and w
cpd_z = FunctionalCPD(
    "z",
    fn=lambda parents: dist.Bernoulli(
        logits=(-1.2 + 0.8 * parents["y"] + 0.2 * parents["w"])
    ),
    parents=["y", "w"],
)

# Continuous outcome depending on y and w: c = 0.2 + 0.5*y + 0.3*w + ε
cpd_c = FunctionalCPD(
    "c",
    fn=lambda parents: dist.Normal(0.2 + 0.5 * parents["y"] + 0.3 * parents["w"], 0.7),
    parents=["y", "w"],
)

func_bn.add_cpds(cpd_x1, cpd_x2, cpd_w, cpd_y, cpd_z, cpd_c)
func_bn.check_model()

# Simulate data from the model
df_func = func_bn.simulate(n_samples=1000, seed=123)

# For learning and inference in Functional Bayesian Networks, please refer to the example notebook: https://github.com/pgmpy/pgmpy/blob/dev/examples/Functional_Bayesian_Network_Tutorial.ipynb

Contributing

We welcome all contributions --not just code-- to pgmpy. Please refer out contributing guide for more details. We also offer mentorship for new contributors and maintain a list of potential mentored projects. If you are interested in contributing to pgmpy, please join our discord server and introduce yourself. We will be happy to help you get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgmpy-1.1.2.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgmpy-1.1.2-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file pgmpy-1.1.2.tar.gz.

File metadata

  • Download URL: pgmpy-1.1.2.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pgmpy-1.1.2.tar.gz
Algorithm Hash digest
SHA256 a62a89352f139a5d459839e54e6bfa202a7d8749e6dc6d7ceec088a71529af45
MD5 2e74a2710614ea918fcdb5559c2d8e36
BLAKE2b-256 729ab94ccd1d6cbf2982937c26e54e887b47ee10f848ad53c4427965578a358d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgmpy-1.1.2.tar.gz:

Publisher: publish.yml on pgmpy/pgmpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pgmpy-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: pgmpy-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pgmpy-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e55c78763a4a45dd644a13b250cea86af0c7e08590cf35de489624f34a4d9a0b
MD5 07a7f4577d1e81b7dab1d0cc3ca4d05f
BLAKE2b-256 c65dd03634ed296986abad834a69b0df21510cc9b6c40fb8afaed5df1c4b6074

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgmpy-1.1.2-py3-none-any.whl:

Publisher: publish.yml on pgmpy/pgmpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page