Skip to main content

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with PyTensor

Project description

PyMC logo

Build Status Coverage NumFOCUS_badge Binder Dockerhub DOIzenodo Conda Downloads

PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Check out the PyMC overview, or one of the many examples! For questions on PyMC, head on over to our PyMC Discourse forum.

Features

  • Intuitive model specification syntax, for example, x ~ N(0,1) translates to x = Normal('x',0,1)

  • Powerful sampling algorithms, such as the No U-Turn Sampler, allow complex models with thousands of parameters with little specialized knowledge of fitting algorithms.

  • Variational inference: ADVI for fast approximate posterior estimation as well as mini-batch ADVI for large data sets.

  • Relies on PyTensor which provides:
    • Computation optimization and dynamic C or JAX compilation

    • NumPy broadcasting and advanced indexing

    • Linear algebra operators

    • Simple extensibility

  • Transparent support for missing value imputation

Linear Regression Example

Plant growth can be influenced by multiple factors, and understanding these relationships is crucial for optimizing agricultural practices.

Imagine we conduct an experiment to predict the growth of a plant based on different environmental variables.

import pymc as pm

# Taking draws from a normal distribution
seed = 42
x_dist = pm.Normal.dist(shape=(100, 3))
x_data = pm.draw(x_dist, random_seed=seed)

# Independent Variables:
# Sunlight Hours: Number of hours the plant is exposed to sunlight daily.
# Water Amount: Daily water amount given to the plant (in milliliters).
# Soil Nitrogen Content: Percentage of nitrogen content in the soil.


# Dependent Variable:
# Plant Growth (y): Measured as the increase in plant height (in centimeters) over a certain period.


# Define coordinate values for all dimensions of the data
coords={
 "trial": range(100),
 "features": ["sunlight hours", "water amount", "soil nitrogen"],
}

# Define generative model
with pm.Model(coords=coords) as generative_model:
   x = pm.Data("x", x_data, dims=["trial", "features"])

   # Model parameters
   betas = pm.Normal("betas", dims="features")
   sigma = pm.HalfNormal("sigma")

   # Linear model
   mu = x @ betas

   # Likelihood
   # Assuming we measure deviation of each plant from baseline
   plant_growth = pm.Normal("plant growth", mu, sigma, dims="trial")


# Generating data from model by fixing parameters
fixed_parameters = {
 "betas": [5, 20, 2],
 "sigma": 0.5,
}
with pm.do(generative_model, fixed_parameters) as synthetic_model:
   idata = pm.sample_prior_predictive(random_seed=seed) # Sample from prior predictive distribution.
   synthetic_y = idata.prior["plant growth"].sel(draw=0, chain=0)


# Infer parameters conditioned on observed data
with pm.observe(generative_model, {"plant growth": synthetic_y}) as inference_model:
   idata = pm.sample(random_seed=seed)

   summary = pm.stats.summary(idata, var_names=["betas", "sigma"])
   print(summary)

From the summary, we can see that the mean of the inferred parameters are very close to the fixed parameters

Params

mean

sd

hdi_3%

hdi_97%

mcse_mean

mcse_sd

ess_bulk

ess_tail

r_hat

betas[sunlight hours]

4.972

0.054

4.866

5.066

0.001

0.001

3003

1257

1

betas[water amount]

19.963

0.051

19.872

20.062

0.001

0.001

3112

1658

1

betas[soil nitrogen]

1.994

0.055

1.899

2.107

0.001

0.001

3221

1559

1

sigma

0.511

0.037

0.438

0.575

0.001

0

2945

1522

1

# Simulate new data conditioned on inferred parameters
new_x_data = pm.draw(
   pm.Normal.dist(shape=(3, 3)),
   random_seed=seed,
)
new_coords = coords | {"trial": [0, 1, 2]}

with inference_model:
   pm.set_data({"x": new_x_data}, coords=new_coords)
   pm.sample_posterior_predictive(
      idata,
      predictions=True,
      extend_inferencedata=True,
      random_seed=seed,
   )

pm.stats.summary(idata.predictions, kind="stats")

The new data conditioned on inferred parameters would look like:

Output

mean

sd

hdi_3%

hdi_97%

plant growth[0]

14.229

0.515

13.325

15.272

plant growth[1]

24.418

0.511

23.428

25.326

plant growth[2]

-6.747

0.511

-7.740

-5.797

# Simulate new data, under a scenario where the first beta is zero
with pm.do(
 inference_model,
 {inference_model["betas"]: inference_model["betas"] * [0, 1, 1]},
) as plant_growth_model:
   new_predictions = pm.sample_posterior_predictive(
      idata,
      predictions=True,
      random_seed=seed,
   )

pm.stats.summary(new_predictions, kind="stats")

The new data, under the above scenario would look like:

Output

mean

sd

hdi_3%

hdi_97%

plant growth[0]

12.149

0.515

11.193

13.135

plant growth[1]

29.809

0.508

28.832

30.717

plant growth[2]

-0.131

0.507

-1.121

0.791

Getting started

If you already know about Bayesian statistics:

Learn Bayesian statistics with a book together with PyMC

See also the section on books using PyMC on our website.

Audio & Video

Installation

To install PyMC on your system, follow the instructions on the installation guide.

Citing PyMC

Please choose from the following:

  • DOIpaper PyMC: A Modern and Comprehensive Probabilistic Programming Framework in Python, Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Kumar R, Lao J, Luhmann CC, Martin OA, Osthege M, Vieira R, Wiecki T, Zinkov R. (2023)

    • BibTex version

      @article{pymc2023,
        title = {{PyMC}: A Modern and Comprehensive Probabilistic Programming Framework in {P}ython},
        author = {Oriol Abril-Pla and Virgile Andreani and Colin Carroll and Larry Dong and Christopher J. Fonnesbeck and Maxim Kochurov and Ravin Kumar and Junpeng Lao and Christian C. Luhmann and Osvaldo A. Martin and Michael Osthege and Ricardo Vieira and Thomas Wiecki and Robert Zinkov },
        journal = {{PeerJ} Computer Science},
        volume = {9},
        number = {e1516},
        doi = {10.7717/peerj-cs.1516},
        year = {2023}
      }
  • DOIzenodo A DOI for all versions.

  • DOIs for specific versions are shown on Zenodo and under Releases

Contact

We are using discourse.pymc.io as our main communication channel.

To ask a question regarding modeling or usage of PyMC we encourage posting to our Discourse forum under the “Questions” Category. You can also suggest feature in the “Development” Category. Requests for non-technical information about the project are also welcome on Discourse, we also use Discourse internally for general announcements or governance related processes.

You can also follow us on these social media platforms for updates and other announcements:

To report an issue with PyMC please use the issue tracker.

License

Apache License, Version 2.0

Software using PyMC

General purpose

  • Bambi: BAyesian Model-Building Interface (BAMBI) in Python.

  • calibr8: A toolbox for constructing detailed observation models to be used as likelihoods in PyMC.

  • gumbi: A high-level interface for building GP models.

  • SunODE: Fast ODE solver, much faster than the one that comes with PyMC.

  • pymc-learn: Custom PyMC models built on top of pymc3_models/scikit-learn API

Domain specific

  • Exoplanet: a toolkit for modeling of transit and/or radial velocity observations of exoplanets and other astronomical time series.

  • beat: Bayesian Earthquake Analysis Tool.

  • CausalPy: A package focusing on causal inference in quasi-experimental settings.

  • PyMC-Marketing: Bayesian marketing toolbox for marketing mix modeling, customer lifetime value, and more.

See also the ecosystem page on our website. Please contact us if your software is not listed here.

Papers citing PyMC

See Google Scholar here and here for a continuously updated list.

Contributors

The GitHub contributor page shows the people who have added content to this repo which includes a large portion of contributors to the PyMC project but not all of them. Other contributors have added content to other repos of the pymc-devs GitHub organization or have contributed through other project spaces outside of GitHub like our Discourse forum.

If you are interested in contributing yourself, read our Code of Conduct and Contributing guide.

Support

PyMC is a non-profit project under NumFOCUS umbrella. If you want to support PyMC financially, you can donate here.

Professional Consulting Support

You can get professional consulting support from PyMC Labs.

Sponsors

NumFOCUS

PyMCLabs

OpenWoundResearch

Thanks to our contributors

contributors

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymc-5.27.1.tar.gz (499.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymc-5.27.1-py3-none-any.whl (552.5 kB view details)

Uploaded Python 3

File details

Details for the file pymc-5.27.1.tar.gz.

File metadata

  • Download URL: pymc-5.27.1.tar.gz
  • Upload date:
  • Size: 499.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymc-5.27.1.tar.gz
Algorithm Hash digest
SHA256 f62da370dc1722ea77cddc57322b99f2703100f0cc9b66e811da98d080f47f8b
MD5 200e04fe077c2dc5d22ea6941db3d152
BLAKE2b-256 800566855ac27e611ece0680b8bcd7a3b6df7c63c85b1a5b8129c4b60506f4d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymc-5.27.1.tar.gz:

Publisher: release.yml on pymc-devs/pymc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymc-5.27.1-py3-none-any.whl.

File metadata

  • Download URL: pymc-5.27.1-py3-none-any.whl
  • Upload date:
  • Size: 552.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymc-5.27.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a5e663d1b2f9d51bd764ad8da4538b254df78889465f23a136a21d20425e046
MD5 6e1eec8386a01c36c2adedd11a982484
BLAKE2b-256 85f48d45edb09c14dde16c78756ebcae1160c40e6f6d2f565e620d63eda37225

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymc-5.27.1-py3-none-any.whl:

Publisher: release.yml on pymc-devs/pymc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page