Stochastic copula models with Ornstein-Uhlenbeck latent process
Project description
pyscarcopula
Stochastic copula models with Ornstein-Uhlenbeck latent process in Python.
About
pyscarcopula fits multivariate distributions using the copula approach with time-varying dependence. The classical constant-parameter model is extended to a stochastic model where the copula parameter follows an Ornstein-Uhlenbeck process. The idea is based on the discrete SCAR model of Liesenfeld and Richard (2003) and Hafner and Manner (2012); here we develop it in the continuous-time setting.
For parameter estimation we provide five methods:
| Method | Key | Description |
|---|---|---|
| Maximum likelihood | mle |
Classical fit with constant copula parameter |
| MC p-sampler | scar-p-ou |
Monte Carlo without importance sampling |
| MC m-sampler | scar-m-ou |
Monte Carlo with efficient importance sampling (EIS) |
| Transfer matrix | scar-tm-ou |
Deterministic quadrature on a grid — numerically exact, no MC bias |
| GAS | gas |
Generalized autoregressive score (observation-driven, deterministic) |
The transfer matrix method exploits the Markov structure and known Gaussian transition density of the OU process to evaluate the likelihood function as a sequence of matrix-vector products with complexity $O(TK^2)$. The implementation automatically selects between dense and sparse transfer matrices depending on the kernel bandwidth, and adaptively refines the grid to resolve the transition kernel.
Install
pip install pyscarcopula
For development (includes data files and tests):
git clone https://github.com/AANovokhatskiy/pyscarcopula
cd pyscarcopula
pip install -e ".[test]"
pytest tests/
Dependencies: numpy, numba, scipy, joblib, tqdm.
Features
Copula families
- Archimedean: Gumbel, Frank, Clayton, Joe (with rotations 0°/90°/180°/270°)
- Elliptical: Gaussian, Student-t (MLE only)
- Independence copula (null model for automatic vine pruning)
- C-vine pair copula construction
Estimation methods
- MLE — constant copula parameter
- SCAR-TM-OU — transfer matrix with analytical gradient (recommended)
- GAS — observation-driven score model
- SCAR-P-OU / SCAR-M-OU — Monte Carlo alternatives
Vine copulas
- Automatic copula family and rotation selection per edge (AIC/BIC)
- Automatic pruning of weak edges via independence copula baseline
- Tree-level and edge-level truncation for scalability to large d
Diagnostics and risk
- Goodness-of-fit via Rosenblatt transform + Cramér–von Mises test
- Mixture Rosenblatt transform for stochastic models
- Smoothed time-varying copula parameter $\bar{\theta}k = \mathbb{E}[\Psi(x_k) \mid u{1:k-1}]$
- VaR / CVaR via
pyscarcopula.contrib(rolling window, marginals, portfolio optimization, parallel)
Mathematical background
Copula models
By Sklar's theorem, any joint distribution can be decomposed as
F(x_1, \ldots, x_d) = C(F_1(x_1), \ldots, F_d(x_d))
We focus on single-parameter Archimedean copulas defined via a generator $\phi(t; \theta)$:
C(u_1, \ldots, u_d) = \phi^{-1}(\phi(u_1; \theta) + \cdots + \phi(u_d; \theta))
| Copula | $\phi(t; \theta)$ | $\phi^{-1}(t; \theta)$ | $\theta \in$ |
|---|---|---|---|
| Gumbel | $(-\log t)^\theta$ | $\exp(-t^{1/\theta})$ | $[1, \infty)$ |
| Frank | $-\log\left(\frac{e^{-\theta t} - 1}{e^{-\theta} - 1}\right)$ | $-\frac{1}{\theta}\log(1 + e^{-t}(e^{-\theta} - 1))$ | $(0, \infty)$ |
| Joe | $-\log(1 - (1-t)^\theta)$ | $1 - (1 - e^{-t})^{1/\theta}$ | $[1, \infty)$ |
| Clayton | $\frac{1}{\theta}(t^{-\theta} - 1)$ | $(1 + t\theta)^{-1/\theta}$ | $(0, \infty)$ |
Stochastic copula (SCAR)
In the stochastic model the copula parameter is driven by a latent Ornstein-Uhlenbeck process:
\theta_t = \Psi(x_t), \qquad dx_t = \theta_{\text{OU}}(\mu - x_t)\,dt + \nu\,dW_t
where $\Psi$ maps the OU state to the copula parameter domain. The likelihood function is an integral over all latent paths:
L = \int \prod_{t} c(u_{1t}, u_{2t}; \Psi(x_t))\;p(x_t | x_{t-1})\;dx_0 \cdots dx_T
Transfer matrix method
The Markov property allows this high-dimensional integral to be factored into a chain of one-dimensional integrals, each computed as a matrix-vector product on a discretized grid. The backward recursion is:
\mathbf{m}_t = \widetilde{\mathbf{T}}(\mathbf{f}_t \odot \mathbf{m}_{t+1}), \qquad t = T-1, \ldots, 1
where $\widetilde{\mathbf{T}}$ is the transition kernel with trapezoidal weights baked in, $\mathbf{f}_t$ is the copula density evaluated at observation $t$, and $\odot$ is the element-wise product. Total complexity: $O(TK^2)$ (dense) or $O(TKb)$ (sparse, where $b$ is the kernel bandwidth).
Goodness of fit
Model quality is assessed via the Rosenblatt transform. For the stochastic model we use the mixture Rosenblatt transform:
u'_{2,t} = \mathbb{E}\left[h(u_{2t}, u_{1t}; \Psi(x_t)) \mid u_{1:t-1}\right]
which integrates the h-function over the predictive distribution of the latent state, avoiding the Jensen bias from plugging in a point estimate. Uniformity of the transformed sample is tested with the Cramér–von Mises statistic.
Examples
1. Read dataset
import pandas as pd
import numpy as np
from pyscarcopula.utils import pobs
from pyscarcopula import (GumbelCopula, FrankCopula, JoeCopula, ClaytonCopula,
IndependentCopula, GaussianCopula, StudentCopula,
CVineCopula)
from pyscarcopula.stattests import gof_test
crypto_prices = pd.read_csv("data/crypto_prices.csv", index_col=0, sep=';')
tickers = ['BTC-USD', 'ETH-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:]
returns = returns_pd.values
u = pobs(returns)
2. Initialize and fit a bivariate copula
copula_mle = GumbelCopula(rotate=180)
copula_tm = GumbelCopula(rotate=180)
copula_gas = GumbelCopula(rotate=180)
# Static model (constant parameter)
fit_result_mle = copula_mle.fit(data=returns, method='mle', to_pobs=True)
gof_result_mle = gof_test(copula_mle, returns, to_pobs=True)
# Stochastic model (transfer matrix)
fit_result_tm = copula_tm.fit(data=returns, method='scar-tm-ou', to_pobs=True)
gof_result_tm = gof_test(copula_tm, returns, to_pobs=True)
# GAS model
fit_result_gas = copula_gas.fit(data=returns, method='gas', to_pobs=True)
gof_result_gas = gof_test(copula_gas, returns, to_pobs=True)
Results on daily BTC-ETH data (T = 1460):
| Model | logL | GoF p-value |
|---|---|---|
| MLE | 955.63 | 0.0087 |
| GAS | 1031.42 | 0.5282 |
| SCAR-TM | 1042.47 | 0.6201 |
SCAR-TM achieves the highest log-likelihood and the best GoF calibration. MLE is rejected at the 1% level; both dynamic models pass comfortably.
Available rotations: [0, 90, 180, 270]. Available methods: ['mle', 'scar-p-ou', 'scar-m-ou', 'scar-tm-ou', 'gas'].
3. Sample from copula
# Sample from constant-parameter copula
samples_mle = copula_mle.sample(n=1000, r=fit_result_mle.copula_param)
# Sample next state from stochastic copula
samples_tm = copula_tm.predict(n=1000)
4. Fit a multivariate C-vine copula
tickers = ['BTC-USD', 'ETH-USD', 'BNB-USD', 'ADA-USD', 'XRP-USD', 'DOGE-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:251]
returns = returns_pd.values
u = pobs(returns)
# C-vine with SCAR-TM (truncated for speed)
vine = CVineCopula()
vine.fit(u, method='scar-tm-ou',
truncation_level=2, min_edge_logL=10)
vine.summary()
# Comparison models
copula_s = StudentCopula()
copula_g = GaussianCopula()
copula_s.fit(u)
copula_g.fit(u)
gof_result_vine = gof_test(vine, u, to_pobs=False)
gof_result_s = gof_test(copula_s, u, to_pobs=False)
gof_result_g = gof_test(copula_g, u, to_pobs=False)
Results on 6-dimensional crypto data (T = 250):
| Model | logL | GoF p-value | Fit time |
|---|---|---|---|
| C-vine SCAR-TM | 921.9 | 0.8971 | 13.4s |
| C-vine MLE | 869.2 | 0.2072 | 0.6s |
| Student-t | 764.4 | 0.0001 | — |
| Gaussian | 761.0 | 0.0000 | — |
The C-vine with SCAR-TM substantially outperforms all alternatives. With truncation (truncation_level=2, min_edge_logL=10), only 9 of 15 edges use SCAR — the remaining 6 fall back to MLE — achieving a good speed/accuracy tradeoff.
5. Risk metrics (VaR / CVaR)
Risk metrics are provided in pyscarcopula.contrib — optional modules for rolling VaR/CVaR estimation, marginal distribution fitting, and portfolio optimization.
from pyscarcopula.contrib.risk_metrics import risk_metrics
d = len(tickers)
result = risk_metrics(
vine, returns,
window_len=100,
gamma=[0.95],
N_mc=[100_000],
marginals_method='johnsonsu',
method='mle',
optimize_portfolio=False,
portfolio_weight=np.ones(d) / d,
n_jobs=-1, # parallel over rolling windows
)
var = result[0.95][100_000]['var']
cvar = result[0.95][100_000]['cvar']
See example.ipynb for a complete walkthrough with plots.
Performance tuning
The SCAR-TM-OU method has several parameters that control the speed/accuracy tradeoff. Here are recommended configurations:
Bivariate copula
copula = GumbelCopula(rotate=180)
# Default (analytical gradient + smart init, both enabled by default)
copula.fit(u, method='scar-tm-ou')
# Disable optimizations for debugging / backward compatibility
copula.fit(u, method='scar-tm-ou', analytical_grad=False, smart_init=False)
# Relaxed tolerance (faster, slight logL loss)
copula.fit(u, method='scar-tm-ou', tol=5e-2)
| Parameter | Default | Effect |
|---|---|---|
analytical_grad |
True |
Analytical gradient. ~3-4x fewer function evaluations. |
smart_init |
True |
Heuristic initial point. Up to 5x speedup on long series. |
tol |
1e-2 |
Gradient tolerance. 5e-2 is ~2x faster with negligible logL loss. |
K |
300 |
Minimum grid size. The adaptive rule may increase it to ensure adequate resolution of the transition kernel. |
pts_per_sigma |
2 |
Grid density: number of points per conditional standard deviation $\sigma_c$. The adaptive rule computes $K_\text{eff} = \max(K,, \lceil 2R\sigma / (\sigma_c / \texttt{pts_per_sigma}) \rceil)$, where $R$ is the grid half-range in $\sigma$ units. Higher values improve quadrature accuracy at the cost of larger $K$ and longer runtime. The default of 2 is sufficient for most cases; increasing to 4 may help for very peaked transition kernels (large $\theta$, small $\nu$). |
Vine copula
vine = CVineCopula()
# Recommended: skip SCAR on weak edges and upper trees
vine.fit(u, method='scar-tm-ou', truncation_level=2, min_edge_logL=10)
| Parameter | Default | Effect |
|---|---|---|
truncation_level |
None |
Trees ≥ this level stay MLE. Recommended: 2-3 for d > 10. |
min_edge_logL |
None |
Edges with MLE logL below threshold stay MLE. Recommended: 5-10. |
Truncated edges use MLE (constant parameter). The GoF test handles mixed SCAR/MLE models correctly.
d=6 crypto, T=250:
| Configuration | Time | logL | GoF p-value |
|---|---|---|---|
truncation_level=2, min_edge_logL=10 |
~13s | ~922 | ~0.90 |
| Full SCAR (15 edges) | ~30s | ~891 | ~0.90 |
| MLE only | ~0.6s | ~869 | ~0.21 |
License
MIT License. See LICENSE.txt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyscarcopula-0.1.1.tar.gz.
File metadata
- Download URL: pyscarcopula-0.1.1.tar.gz
- Upload date:
- Size: 4.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93461a380b761d2be309cce14b02fa0930dd8ae801ecb6e3f9d4d5289ac93792
|
|
| MD5 |
b2686c0c56744ab1c01e0a5d2633dd7e
|
|
| BLAKE2b-256 |
45eda5e4ef6ce986b5b9f12c392c2649e88dea8cd446595a9f19f26f5f874e43
|
File details
Details for the file pyscarcopula-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyscarcopula-0.1.1-py3-none-any.whl
- Upload date:
- Size: 62.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ca08f36ee9f704ed8c864d76679ca568dc77e0d2604ab9513038af36798db41
|
|
| MD5 |
10804b99da5e7ef78af0b8992ad4304b
|
|
| BLAKE2b-256 |
ce5edfdc9a17e4ab79c135bee911817fc4671c67ad71aa020b639cceaf94faab
|