coppuccino

Fit distributions with normalizing flows + copulas with JAX

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aarond.johnson

These details have not been verified by PyPI

Project description

coppuccino

Density estimation for multivariate data with copula normalizing flows, in JAX.

Python 3.11 | 3.12 | 3.13 | 3.14 MIT License

coppuccino fits and samples from complex multivariate probability distributions by combining two classical ideas: copulas, which separate a joint distribution into its one-dimensional marginals and a dependency structure, and normalizing flows, which learn that dependency structure flexibly. It is designed for — though not limited to — density estimation on MCMC posterior samples, where it enables fast resampling, density evaluation, importance weighting, and calibration diagnostics.

The idea

Any continuous joint distribution can be decomposed into its marginals and a copula that couples them. This is Sklar's theorem: for a joint density $p(\mathbf{x})$ on $\mathbb{R}^d$ with marginal CDFs $F_i$ and marginal densities $p_i$,

$$ p(\mathbf{x}) = \underbrace{c\big(F_1(x_1), \dots, F_d(x_d)\big)}{\text{copula density}} \prod{i=1}^{d} \underbrace{p_i(x_i)}_{\text{marginals}}. $$

The copula density $c$ captures all of the dependence between dimensions; the $p_i$ capture the shape of each dimension on its own. Estimating these two pieces separately is often much easier than estimating $p(\mathbf{x})$ directly — especially when the marginals are sharp, bounded, skewed, or multimodal, which tends to defeat a plain normalizing flow.

coppuccino estimates the marginals nonparametrically and the copula with a normalizing flow.

How it works

Fitting proceeds in two stages.

flowchart LR
  X["data x in R^d<br/>(e.g. MCMC samples)"]
  Z["Gaussianized z<br/>standard-normal marginals,<br/>dependency retained"]
  E["base distribution<br/>N(0, I)"]
  X -->|"empirical CDF F_i, then inverse-normal Phi^-1<br/>(per dimension, frozen)"| Z
  Z -->|"triangular spline flow<br/>(learns the copula)"| E

1. Empirical marginal transforms (Gaussianization). Each dimension is mapped to a standard normal through the probability integral transform followed by the inverse-normal CDF,

$$ z_i = \Phi^{-1}\big(F_i(x_i)\big), \qquad i = 1, \dots, d, $$

where $\Phi$ is the standard normal CDF and each $F_i$ is an empirical CDF built from the data quantiles as a monotone spline (a rational-quadratic spline by default; see Marginal transforms). After this step the transformed variable $\mathbf{z}$ has exactly standard-normal marginals while retaining the full dependency structure — so the copula is all that remains to be modeled. These transforms are estimated once and then frozen.

2. The copula flow. A triangular spline flow $T$ models the joint density of $\mathbf{z}$ against a standard-normal base $\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$. If the dependence were exactly Gaussian, $T$ would only need to learn a correlation — the classical Gaussian copula; the flow generalizes this to arbitrary dependence. Only this stage is trained.

Putting the stages together, the log-density of a point is available in closed form via the change-of-variables formula,

$$ \log p(\mathbf{x}) = \log p_Z(\mathbf{z}) + \sum_{i=1}^{d}\Big( \log p_i(x_i) - \log \phi(z_i) \Big), $$

where $\log p_Z$ is supplied by the flow, $\phi$ is the standard normal density, and the sum is the log-Jacobian of the marginal Gaussianization. Sampling runs the pipeline in reverse: draw $\boldsymbol\varepsilon$ from the base, push it through the flow to obtain $\mathbf{z}$, then invert the marginal transforms with $x_i = F_i^{-1}\big(\Phi(z_i)\big)$.

Why this design

Separation of concerns. The flow spends all of its capacity on dependency, because the marginals are already standardized. The empirical CDFs, in turn, reproduce difficult marginal shapes (bounded, spiky, heavy-tailed, multimodal) that flows struggle to fit on their own.
Honest density and importance weights. With the default rational-quadratic marginals, the forward transform, its inverse, and its derivative all come from a single parameterization and are mutually exact to machine precision, so sample_and_log_prob returns self-consistent densities.
Built for bounded posteriors. By default the marginals clip to the observed data range; prior_bounds extends them cleanly to a known prior support, which is the common situation for MCMC chains.

Installation

pip install coppuccino

From source:

git clone https://github.com/AaronDJohnson/coppuccino.git
cd coppuccino
pip install .

Development

This project is managed with uv:

uv sync           # create the environment and install coppuccino + dev deps
uv run pytest     # run the test suite

Quick start

import numpy as np
from coppuccino import normalizing_flows_fit, sample, log_prob, save_flow, load_flow

# Fit a copula flow to multivariate data (e.g. MCMC posterior samples)
data = np.random.randn(5000, 3)
flow = normalizing_flows_fit(data, max_epochs=200)

# Draw new samples from the fitted distribution
new_samples = sample(flow, n_samples=1000, rng_seed=42)

# Evaluate the log probability density
log_probs = log_prob(flow, new_samples)

# Persist and reload the model
save_flow(flow, "my_flow.pkl")
loaded_flow = load_flow("my_flow.pkl")

The examples/ directory contains a worked, end-to-end notebook — galactic_binary_example.ipynb — that fits a flow to a real gravitational-wave posterior and compares the samples to the original chain, plus an HDR calibration walkthrough in hdr_credibility.ipynb.

API

Function	Purpose
`normalizing_flows_fit(chain, ...)`	Fit a copula flow to data; returns the fitted model.
`sample(flow, n_samples, rng_seed=...)`	Draw samples from a fitted flow.
`log_prob(flow, samples)`	Evaluate the log density at given points.
`sample_and_log_prob(flow, n_samples, ...)`	Draw samples and their log densities together (for importance sampling).
`save_flow(flow, path)` / `load_flow(path)`	Serialize / deserialize a fitted flow.
`compute_injection_hdr(samples, injection_params, ...)`	HDR credibility of injected/true parameters (calibration).
`check_in_support(samples, injection_params)`	Whether a point lies within the sample support.

Key options to normalizing_flows_fit include flow_layers and knots (flow capacity), max_epochs, learning_rate, patience (training), prior_bounds, tail_model, and marginal (the marginal model, below). See the docstrings for the full list.

HDR credibility (inference validation)

The highest-density-region (HDR) credibility of a known/injected parameter is the fraction of the fitted distribution that is more probable than that point — equivalently, the smallest HDR credible level whose region contains it. A point at the mode scores near 0; a point far in the tails scores near 1. For well-calibrated Bayesian inference these values are uniform on $[0, 1]$ across many events — a standard probability–probability check.

from coppuccino import compute_injection_hdr

posterior_samples = ...          # shape (n_samples, n_params)
true_params = np.array([1.0, 2.0, 3.0])

hdr = compute_injection_hdr(posterior_samples, true_params)
# `hdr` is an array: hdr[0] for a single injection, or one value per row for a
# 2D batch of injections. Injections outside the sample support return 1.0.

Marginal transforms

The empirical marginals are the heart of the method, and a few options control how they behave at and beyond the edges of the data.

Interpolant

flow = normalizing_flows_fit(data)                   # marginal="rqs" (default)
flow = normalizing_flows_fit(data, marginal="pchip") # original PCHIP interpolant

Both families use the same empirical-quantile knots and the same tail and prior-bound handling, so they produce nearly identical fits. "rqs" (a monotone rational-quadratic spline) is preferred because its forward map, inverse, and derivative share a single parameterization and are mutually exact to machine precision, which keeps sample_and_log_prob weights honest. The "pchip" path builds the CDF and its inverse as two independent splines that are only approximate inverses of each other; it is retained for reproducing older fits.

Prior bounds (recommended for MCMC chains)

# Extend the empirical CDF out to a known prior support
bounds = np.array([[-10, 10], [-5, 5], [0, 100]])
flow = normalizing_flows_fit(data, prior_bounds=bounds)

By default the marginals clip to the observed data range. Supplying prior_bounds extends each marginal CDF to the prior edges, so the flow can generate samples across the full prior support rather than being capped at the most extreme training sample.

Heavy-tailed marginals (experimental)

# EXPERIMENTAL: peaks-over-threshold Generalized Pareto tails
flow = normalizing_flows_fit(data, tail_model="gpd", tail_quantile=0.05)

For heavy-tailed marginals, tail_model="gpd" fits a Generalized Pareto Distribution to each tail via peaks-over-threshold, modeling the extremes more faithfully than the default Gaussian tail. tail_quantile sets the fraction of samples in each tail (default 0.05). This feature is experimental — its API may change — and it ignores tail_extension and prior_bounds.

Requirements

Python 3.11–3.14
JAX / jaxlib >=0.4.38,<0.8
NumPy >=1.26,<3
SciPy >=1.10 (>=1.11.3 on Python 3.12, the oldest with cp312 wheels)
Equinox >=0.13.2,<0.14
jaxtyping >=0.3.6,<0.4
interpax >=0.3.11,<0.4
FlowJAX >=17.2.1,<18
paramax >=0.0.3
cloudpickle >=2.2.1,<4 (used by save_flow / load_flow)

Python 3.11 through 3.14 are supported and tested. The lower bounds are the oldest versions that pass the test suite on 3.11/3.12; on 3.13 and 3.14 a normal install resolves newer versions (the oldest floors predate cp313/cp314 wheels), which is what the CI "highest" jobs exercise there. The JAX upper bound is set by the flowjax 17.2.1 pairing, not by choice — jax 0.7.x is the newest series that runs cleanly with it while still shipping cp314 wheels. The exact ranges (including per-Python markers) live in pyproject.toml and pip/uv resolve them automatically. The example notebooks need a few extra packages — see examples/requirements.txt.

Citation

If you use coppuccino in your research, please cite it:

@software{coppuccino,
  author  = {Johnson, Aaron D.},
  title   = {coppuccino: copula normalizing flows in JAX},
  year    = {2026},
  version = {1.0.0},
  url     = {https://github.com/AaronDJohnson/coppuccino}
}

References

A. Sklar (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l'Institut de Statistique de l'Université de Paris. — Sklar's theorem.
C. Durkan, A. Bekasov, I. Murray, G. Papamakarios (2019). Neural Spline Flows. NeurIPS. — The rational-quadratic spline transforms used here.
G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, B. Lakshminarayanan (2021). Normalizing Flows for Probabilistic Modeling and Inference. JMLR. — Review of normalizing flows.
The flow implementation builds on FlowJAX.

License

MIT © Aaron D. Johnson

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aarond.johnson

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coppuccino-1.0.0.tar.gz (2.8 MB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coppuccino-1.0.0-py3-none-any.whl (26.7 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file coppuccino-1.0.0.tar.gz.

File metadata

Download URL: coppuccino-1.0.0.tar.gz
Upload date: Jun 16, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coppuccino-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f8a4c3be91431c32b0b10dab36e29236c46cc83971e3b4e84529724dfe12a519`
MD5	`8c7a72505a7e2acb4976c8b5c30b71e5`
BLAKE2b-256	`0e2b2d329c299126ec6a9aa361ec487ba0d0e87ceea9228d2e7f9179ff31fa76`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coppuccino-1.0.0.tar.gz:

Publisher: publish.yml on AaronDJohnson/coppuccino

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coppuccino-1.0.0.tar.gz
- Subject digest: f8a4c3be91431c32b0b10dab36e29236c46cc83971e3b4e84529724dfe12a519
- Sigstore transparency entry: 1842936889
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: AaronDJohnson/coppuccino@97ed036598b0c7604f1cc9552c81238891fb1589
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/AaronDJohnson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@97ed036598b0c7604f1cc9552c81238891fb1589
- Trigger Event: release

File details

Details for the file coppuccino-1.0.0-py3-none-any.whl.

File metadata

Download URL: coppuccino-1.0.0-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coppuccino-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`194ef7b69fe7767fc9a63faee4ddeb2d0be0b8c4a6d7292b719fe12b9eb6c199`
MD5	`76ce178160fd1cea125b1361b5828907`
BLAKE2b-256	`652a758d82546ce5d65789c1c86581b3f3747f2d93d945c8132b7be04c8a9096`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coppuccino-1.0.0-py3-none-any.whl:

Publisher: publish.yml on AaronDJohnson/coppuccino

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coppuccino-1.0.0-py3-none-any.whl
- Subject digest: 194ef7b69fe7767fc9a63faee4ddeb2d0be0b8c4a6d7292b719fe12b9eb6c199
- Sigstore transparency entry: 1842936970
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: AaronDJohnson/coppuccino@97ed036598b0c7604f1cc9552c81238891fb1589
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/AaronDJohnson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@97ed036598b0c7604f1cc9552c81238891fb1589
- Trigger Event: release

coppuccino 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

coppuccino

The idea

How it works

Why this design

Installation

Development

Quick start

API

HDR credibility (inference validation)

Marginal transforms

Interpolant

Prior bounds (recommended for MCMC chains)

Heavy-tailed marginals (experimental)

Requirements

Citation

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance