Augmented Mixed-frequency Bayesian Regional Inference with Constraints
Project description
AMBRIC
Augmented Mixed-frequency Bayesian Regional Inference with Constraints
AMBRIC is a Bayesian state-space model for estimating latent regional growth in a given variable from sparse and temporally misaligned observations. It combines factor analysis, autoregressive dynamics at both the factor and regional level, and observation constraints from aggregate (country-wide) and annual regional data.
When would I need this model?
- When you have a highly lagged annual release of, say, economic growth at the regional level but quarterly growth at the national level with minimal lag.
- When you want to nowcast regional growth—both quarterly and annual—from published national quarterly growth.
It was originally designed to nowcast UK gross value-added (GVA) and gross household disposable income (GHDI) for regions following the publication of the equivalent quarterly national growth rate.
The core model features are:
-
Hierarchical structure: Factor and macro loadings are partially pooled across regions, enabling information sharing while allowing regional heterogeneity.
-
Dual AR(1) dynamics: Persistence is modelled at both the factor level (common shocks) and regional level (idiosyncratic dynamics).
-
Robust observations: Student-t likelihoods for UK and annual constraints provide robustness to outliers.
-
Mixed frequency: Only annual growth data are observed, but latent quarterly growth data are estimated.
-
Cross-sectional constraint: National and regional growth rates are constrained to be consistent.
-
Temporal constraints: Annual regional growth rates are constrained to be consistent with latent quarterly growth rates.
-
Soft implementation of constraints: Regional growth rates are not always exactly consistent with the national growth rate, for measurement, extra-regional, and rounding reasons. Regional weights, $w$, are estimated rather than fixed, allowing the model to learn the effective contribution of each region to the national total.
-
Dimensional reduction: With a rich set of data on each region, it would be extremely difficult to solve the model due to the increasing number of parameters. Factor analysis is applied to the panel of regional indicators first to reduce the dimensionality to a manageable level.
-
XGBoost bridge signal: An XGBoost model is trained on annually-aggregated regional indicators and macro variables to predict annual regional growth. These annual predictions are then disaggregated to quarterly frequency via a MIDAS bridge equation, producing a quarterly signal $s_{t,r}$ that enters the state-space model with a hierarchical loading $\delta_r$. This allows the model to incorporate non-linear relationships captured by XGBoost while retaining the Bayesian uncertainty quantification of the state-space framework.
AMBRIC is currently supported on macOS and Linux only; Windows users should run AMBRIC under WSL.
Model Details
Variable Definitions
Let $Y$ be the variable of interest in levels.
- $t = 1, \ldots, T$ at quarterly frequency.
- $r = 1, \ldots, R$ denotes the $R$ regions of the nation.
- $Y^\text{UK}_t$ is the level in quarter $t$ for the whole nation (observed).
- $y^\text{UK}_t = \log(Y^\text{UK}t) - \log(Y^\text{UK}{t-1})$ is the quarterly growth rate for the whole nation (observed).
- $Y_{t, r}$ is the level for region $r$ in quarter $t$ (never observed)
- $Y_{t, r}^A = Y_{t, r} + Y_{t-1, r} + Y_{t-2, r} + Y_{t-3, r}$ is the annual level for region $r$. Observed for Q4 only, and with a lag.
- $y_{t, r}^A = \log(Y_{t, r}^A) - \log(Y^{A}{t-4,r})$ is the annual growth in region $r$; observed Q4 only. $y^A_t = (y^{A}{t,1}, \ldots, y_{t, r}^A)'$ is the vector of these.
- $y_{t, r} = \log(Y_{t, r}) - \log(Y_{t-1, r})$ is the quarterly growth rate in region $r$ (never observed). $y^Q_t = (y_{t,1}, \ldots, y_{t, r})'$ is the vector of these.
- $\boldsymbol{Z}t$ is a panel of regional indicators, with elements $Z{j,r,t}$.
- $s_{t,r}$ is a quarterly bridge signal for region $r$, derived from XGBoost annual predictions disaggregated via a MIDAS bridge equation. $\mathbf{s}t = (s{t,1}, \ldots, s_{t,R})'$ is the vector of these.
The key dimensions of the problem are:
| Symbol | Description |
|---|---|
| $T$ | Number of time periods (quarters) |
| $R$ | Number of regions |
| $J$ | Number of indicators per region in the regional data |
| $K$ | Number of latent factors drawn from the regional panel data |
| $M$ | Number of national macroeconomic covariates |
Model Equations
The core equation of AMBRIC is:
$$ \begin{equation} \mathbf{y}_t = \boldsymbol{\Phi}r \mathbf{y}{t-1} + (\mathbf{I} - \boldsymbol{\Phi}_r)(\boldsymbol{\Lambda} \mathbf{F}_t + \boldsymbol{\Gamma} \mathbf{X}_t + \boldsymbol{\delta} \odot \mathbf{s}_t) + \boldsymbol{\epsilon}_t \end{equation} $$
where $\mathbf{y}_t$ is the vector of regional quarterly growth rates, $\mathbf{F}_t$ is a vector of factors based on a regional panel of indicators, $\mathbf{X}_t$ is a vector of national statistics, and $\mathbf{s}_t$ is a quarterly bridge signal derived from XGBoost predictions via a MIDAS bridge equation. The auto-regressive term in equation (1) is a diagonal matrix, $\boldsymbol{\Phi}_r = \text{diag}(\boldsymbol{\phi}_r)$. The factors $\mathbf{F}_t$ themselves follow an AR(1) process governed by $\boldsymbol{\Phi}_f = \text{diag}(\boldsymbol{\phi}_f)$ (see below). $\boldsymbol{\Lambda}$ are factor loadings, $\boldsymbol{\Gamma}$ are macro loadings, and $\boldsymbol{\delta} = (\delta_1, \ldots, \delta_R)'$ are bridge signal loadings with $\odot$ denoting element-wise multiplication (the Hadamard product.)
Observed vs estimated data
We observe $\boldsymbol{Z}t$, $y_t^\text{UK}$, and, with a significant lag, $y{t, r}^A$ for $t\mod 4 \equiv 0$ (ie 4th quarter only.)
The model estimates many parameters, but those that are "outputs" are $y_{t,r}$ and $y_{t,r}^A$, the latter only for $t\mod 4 \neq 0$.
Cross-sectional and temporal constraints
$\mathbf{y}_t$ are latent variables; we only observe the left-hand sides of the following assumed relationships:
$$ y_t^{\text{UK}} = \mathbf{w}^\top \mathbf{y}_t \quad \text{ and }\quad \boldsymbol{y}_t^{A} = \boldsymbol{\Omega}(L) \mathbf{y}_t $$
although the latter with a lag. In this, the first equation is the cross-sectional constraint and the latter is the temporal constraint, which uses a lag polynomial $\boldsymbol{\Omega}(L) = \sum_{j=0}^{6} \Omega_j L^j$. The cross-sectional constraint ensures that quarterly regional growth is consistent with quarterly national growth, while the temporal constraint ensures that the regional growth is consistent with UK annual growth. (NB: weights not shown here for brevity.)
Because these are soft constraints, they enter the model as:
$$ y_t^{\text{UK}} \sim \mathcal{T}(\nu_{\text{UK}}, \mathbf{w}^\top \mathbf{y}t, \sigma{\text{UK}}), \quad \mathbf{y}t^{\text{A}} \sim \mathcal{T}\left(\nu{\text{A}}, \sum_{j=0}^{6} \Omega_j \mathbf{y}{t-j}, \boldsymbol{\sigma}{\text{A}}\right) $$
where $\mathcal{T}$ is the Student's T-distribution.
Auto-regressive behaviours
The (Bayesian) auto-regressive behaviour of the quarterly regional growth rates is given by
$$y_{t,r} \mid y_{t-1,r} \sim \mathcal{N}\left(\phi_r , y_{t-1,r} + (1 - \phi_r) , \mu_{t,r}^{\text{exog}}, , \sigma_{\varepsilon,r}\right)$$
where
$$ \mu_{t,r}^{\text{exog}} = \boldsymbol{\Lambda}_r \mathbf{F}_t + \boldsymbol{\Gamma}_r \mathbf{X}t + \delta_r , s{t,r} $$
Note that the $(1 - \phi_r)$ scaling ensures that $\mathbb{E}[y_{t,r}] = \mu_{t,r}^{\text{exog}}$.
To reduce the dimensionality of $\boldsymbol{Z}_t$, the panel of regional indicators, we use factor analysis, which finds an $\boldsymbol{F}_t^{\text{obs}}$ with dimension $K$ such that
$$ \mathbf{Z}_t = \mathbf{W} \mathbf{F}_t^{\text{obs}} + \boldsymbol{\mu} + \boldsymbol{\varepsilon}_t, \quad \mathbf{F}_t^{\text{obs}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_K), \quad \boldsymbol{\varepsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi}), \quad \text{Cov}(\mathbf{Z}_t) = \mathbf{W} \mathbf{W}^\top + \boldsymbol{\Psi} $$
where $\mathbf{W}$ is the matrix of factor loadings and $\boldsymbol{\Psi}$ is a diagonal matrix of indicator-specific noise variances. These extracted factors are then treated as noisy observations of an underlying autoregressive latent factor process $\mathbf{F}_t$ such that
$$ \mathbf{F}_t = \text{diag}(\boldsymbol{\phi}f) \mathbf{F}{t-1} + \boldsymbol{\eta}_t, \quad \mathbf{F}_t^{\text{obs}} = \mathbf{F}_t + \boldsymbol{\epsilon}_t^f $$
Bayesian priors
Regional panel and factors
$$\Lambda_\mu \sim \mathcal{N}(0, 0.5) \quad [K]$$
$$\Lambda_\sigma \sim \text{HalfNormal}(0.2) \quad [K]$$
$$\Lambda_{r,k} \sim \mathcal{N}(\Lambda_{\mu,k}, \Lambda_{\sigma,k}) \quad [R \times K]$$
$$\sigma_{\text{exog}} \sim \text{HalfNormal}(0.2) \quad [K]$$
$$F_{t,k}^{\text{obs}} \sim \mathcal{N}(F_{t,k}, \sigma_{\text{exog},k})$$
$$\phi_f \sim \mathcal{N}(0.7, 0.1) \quad [K]$$
$$\sigma_f \sim \text{HalfNormal}(0.1) \quad [K]$$
Macro indicators
$$\Gamma_\mu \sim \mathcal{N}(0, 0.5) \quad [M]$$
$$\Gamma_\sigma \sim \text{HalfNormal}(0.15) \quad [M]$$
$$\Gamma_{r,m} \sim \mathcal{N}(\Gamma_{\mu,m}, \Gamma_{\sigma,m}) \quad [R \times M]$$
Bridge signal loadings
$$\delta_\mu \sim \mathcal{N}(0, 0.3)$$
$$\delta_\sigma \sim \text{HalfNormal}(0.15)$$
$$\delta_r \sim \mathcal{N}(\delta_\mu, \delta_\sigma) \quad [R]$$
Weights and degrees of freedom
$$w \sim \mathcal{N}(1/R, 0.01) \quad [R]$$
$$\nu_{\text{UK}} \sim \text{Gamma}(6, 1)$$
$$\nu_{\text{A}} \sim \text{Gamma}(3, 0.5)$$
Growth
$$\sigma_\varepsilon \sim \text{HalfNormal}(0.1) \quad [R]$$
$$\sigma_{\text{A}} \sim \text{HalfNormal}(0.2) \quad [R]$$
$$\phi_r \sim \mathcal{N}(0.5, 0.15) \quad [R]$$
$$\sigma_{\text{UK}} \sim \text{HalfNormal}(0.01)$$
Parameters
| Parameter | Shape |
|---|---|
| $\Lambda_\mu$ | $K$ |
| $\Lambda_\sigma$ | $K$ |
| $\Lambda$ | $R \times K$ |
| $\Gamma_\mu$ | $M$ |
| $\Gamma_\sigma$ | $M$ |
| $\Gamma$ | $R \times M$ |
| $\delta_\mu$ | $1$ |
| $\delta_\sigma$ | $1$ |
| $\delta_r$ | $R$ |
| $\phi_f$ | $K$ |
| $\sigma_f$ | $K$ |
| $F$ | $T \times K$ |
| $\sigma_{\text{exog}}$ | $K$ |
| $\phi_r$ | $R$ |
| $\sigma_\varepsilon$ | $R$ |
| $y_r$ | $T \times R$ |
| $w$ | $R$ |
| $\sigma_{\text{UK}}$ | $1$ |
| $\sigma_{\text{A}}$ | $R$ |
| $\nu_{\text{UK}}$ | $1$ |
| $\nu_{\text{A}}$ | $1$ |
Total: $5K + 2M + RK + RM + TK + TR + 5R + 5$
Model solution
Prior to estimating the Bayesian model, an XGBoost model is trained on annually-aggregated regional indicators and macro variables to predict annual regional growth. These annual predictions are disaggregated to quarterly frequency using a MIDAS bridge equation, producing the bridge signal $\mathbf{s}_t$. The Bayesian state-space model is then estimated using PyMC and pytensor via ADVI (Automatic Differentiation Variational Inference).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ambric-0.0.5.tar.gz.
File metadata
- Download URL: ambric-0.0.5.tar.gz
- Upload date:
- Size: 218.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfd0b6f51ddb69cb8a26a8d623fe5cee836a77b3e136258655bde7dacab627f4
|
|
| MD5 |
852b6bc5eeb0199c591002c3c38edc6c
|
|
| BLAKE2b-256 |
17cd974714803d6ad05b47ddf408880a6e28140ee3d3453a9b71ac4d2b28f837
|
Provenance
The following attestation bundles were made for ambric-0.0.5.tar.gz:
Publisher:
release.yml on aeturrell/ambric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ambric-0.0.5.tar.gz -
Subject digest:
bfd0b6f51ddb69cb8a26a8d623fe5cee836a77b3e136258655bde7dacab627f4 - Sigstore transparency entry: 1495613750
- Sigstore integration time:
-
Permalink:
aeturrell/ambric@fd4fc6a153157a69db3b8b82da6cd526b5981ea8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aeturrell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd4fc6a153157a69db3b8b82da6cd526b5981ea8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ambric-0.0.5-py3-none-any.whl.
File metadata
- Download URL: ambric-0.0.5-py3-none-any.whl
- Upload date:
- Size: 40.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a5ee940d8fc6edb8473d9f68fc04e7298818e4c1a2515c01528daaefc0006cf
|
|
| MD5 |
4f8f3b0c055f34dd366816b15e7ada62
|
|
| BLAKE2b-256 |
51d6ee65677af5edb0159d0c47ae7bb6ae45fbc68b2135d8eb2a8267d683775f
|
Provenance
The following attestation bundles were made for ambric-0.0.5-py3-none-any.whl:
Publisher:
release.yml on aeturrell/ambric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ambric-0.0.5-py3-none-any.whl -
Subject digest:
5a5ee940d8fc6edb8473d9f68fc04e7298818e4c1a2515c01528daaefc0006cf - Sigstore transparency entry: 1495613852
- Sigstore integration time:
-
Permalink:
aeturrell/ambric@fd4fc6a153157a69db3b8b82da6cd526b5981ea8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aeturrell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd4fc6a153157a69db3b8b82da6cd526b5981ea8 -
Trigger Event:
push
-
Statement type: