Skip to main content

Stochastic copula models with Ornstein-Uhlenbeck latent process

Project description

pyscarcopula

Stochastic copula models with Ornstein-Uhlenbeck latent process in Python.

About

pyscarcopula fits multivariate distributions using the copula approach with time-varying dependence. The classical constant-parameter model is extended to a stochastic model where the copula parameter follows an Ornstein-Uhlenbeck process. The idea is based on the discrete SCAR model of Liesenfeld and Richard (2003) and Hafner and Manner (2012); here we develop it in the continuous-time setting.

For parameter estimation we provide five methods:

Method Key Description
Maximum likelihood mle Classical fit with constant copula parameter
MC p-sampler scar-p-ou Monte Carlo without importance sampling
MC m-sampler scar-m-ou Monte Carlo with efficient importance sampling (EIS)
Transfer matrix scar-tm-ou Deterministic quadrature on a grid — numerically exact, no MC bias
GAS gas Generalized autoregressive score (observation-driven, deterministic)

The transfer matrix method exploits the Markov structure and known Gaussian transition density of the OU process to evaluate the likelihood function as a sequence of matrix-vector products with complexity $O(TK^2)$. The implementation automatically selects between dense and sparse transfer matrices depending on the kernel bandwidth, and adaptively refines the grid to resolve the transition kernel.

Install

pip install pyscarcopula

For development (includes data files and tests):

git clone https://github.com/AANovokhatskiy/pyscarcopula
cd pyscarcopula
pip install -e ".[test]"
pytest tests/

Dependencies: numpy, numba, scipy, joblib, tqdm.

Features

Copula families

  • Archimedean: Gumbel, Frank, Clayton, Joe (with rotations 0°/90°/180°/270°)
  • Elliptical: Gaussian, Student-t (MLE only)
  • Independence copula (null model for automatic vine pruning)
  • C-vine pair copula construction

Estimation methods

  • MLE — constant copula parameter
  • SCAR-TM-OU — transfer matrix with analytical gradient (recommended)
  • GAS — observation-driven score model
  • SCAR-P-OU / SCAR-M-OU — Monte Carlo alternatives

Vine copulas

  • Automatic copula family and rotation selection per edge (AIC/BIC)
  • Automatic pruning of weak edges via independence copula baseline
  • Tree-level and edge-level truncation for scalability to large d

Diagnostics and risk

  • Goodness-of-fit via Rosenblatt transform + Cramér–von Mises test
  • Mixture Rosenblatt transform for stochastic models
  • Smoothed time-varying copula parameter $\bar{\theta}k = \mathbb{E}[\Psi(x_k) \mid u{1:k-1}]$
  • VaR / CVaR via pyscarcopula.contrib (rolling window, marginals, portfolio optimization, parallel)

Mathematical background

Copula models

By Sklar's theorem, any joint distribution can be decomposed as

F(x_1, \ldots, x_d) = C(F_1(x_1), \ldots, F_d(x_d))

We focus on single-parameter Archimedean copulas defined via a generator $\phi(t; \theta)$:

C(u_1, \ldots, u_d) = \phi^{-1}(\phi(u_1; \theta) + \cdots + \phi(u_d; \theta))
Copula $\phi(t; \theta)$ $\phi^{-1}(t; \theta)$ $\theta \in$
Gumbel $(-\log t)^\theta$ $\exp(-t^{1/\theta})$ $[1, \infty)$
Frank $-\log\left(\frac{e^{-\theta t} - 1}{e^{-\theta} - 1}\right)$ $-\frac{1}{\theta}\log(1 + e^{-t}(e^{-\theta} - 1))$ $(0, \infty)$
Joe $-\log(1 - (1-t)^\theta)$ $1 - (1 - e^{-t})^{1/\theta}$ $[1, \infty)$
Clayton $\frac{1}{\theta}(t^{-\theta} - 1)$ $(1 + t\theta)^{-1/\theta}$ $(0, \infty)$

Stochastic copula (SCAR)

In the stochastic model the copula parameter is driven by a latent Ornstein-Uhlenbeck process:

\theta_t = \Psi(x_t), \qquad dx_t = \theta_{\text{OU}}(\mu - x_t)\,dt + \nu\,dW_t

where $\Psi$ maps the OU state to the copula parameter domain. The likelihood function is an integral over all latent paths:

L = \int \prod_{t} c(u_{1t}, u_{2t}; \Psi(x_t))\;p(x_t | x_{t-1})\;dx_0 \cdots dx_T

Transfer matrix method

The Markov property allows this high-dimensional integral to be factored into a chain of one-dimensional integrals, each computed as a matrix-vector product on a discretized grid. The backward recursion is:

\mathbf{m}_t = \widetilde{\mathbf{T}}(\mathbf{f}_t \odot \mathbf{m}_{t+1}), \qquad t = T-1, \ldots, 1

where $\widetilde{\mathbf{T}}$ is the transition kernel with trapezoidal weights baked in, $\mathbf{f}_t$ is the copula density evaluated at observation $t$, and $\odot$ is the element-wise product. Total complexity: $O(TK^2)$ (dense) or $O(TKb)$ (sparse, where $b$ is the kernel bandwidth).

Goodness of fit

Model quality is assessed via the Rosenblatt transform. For the stochastic model we use the mixture Rosenblatt transform:

u'_{2,t} = \mathbb{E}\left[h(u_{2t}, u_{1t}; \Psi(x_t)) \mid u_{1:t-1}\right]

which integrates the h-function over the predictive distribution of the latent state, avoiding the Jensen bias from plugging in a point estimate. Uniformity of the transformed sample is tested with the Cramér–von Mises statistic.

Examples

1. Read dataset

import pandas as pd
import numpy as np
from pyscarcopula.utils import pobs
from pyscarcopula import (GumbelCopula, FrankCopula, JoeCopula, ClaytonCopula,
                          IndependentCopula, GaussianCopula, StudentCopula,
                          CVineCopula)
from pyscarcopula.stattests import gof_test

crypto_prices = pd.read_csv("data/crypto_prices.csv", index_col=0, sep=';')
tickers = ['BTC-USD', 'ETH-USD']

returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:]
returns = returns_pd.values
u = pobs(returns)

2. Initialize and fit a bivariate copula

copula_mle = GumbelCopula(rotate=180)
copula_tm = GumbelCopula(rotate=180)
copula_gas = GumbelCopula(rotate=180)

# Static model (constant parameter)
fit_result_mle = copula_mle.fit(data=returns, method='mle', to_pobs=True)
gof_result_mle = gof_test(copula_mle, returns, to_pobs=True)

# Stochastic model (transfer matrix)
fit_result_tm = copula_tm.fit(data=returns, method='scar-tm-ou', to_pobs=True)
gof_result_tm = gof_test(copula_tm, returns, to_pobs=True)

# GAS model
fit_result_gas = copula_gas.fit(data=returns, method='gas', to_pobs=True)
gof_result_gas = gof_test(copula_gas, returns, to_pobs=True)

Results on daily BTC-ETH data (T = 1460):

Model logL GoF p-value
MLE 955.63 0.0087
GAS 1031.42 0.5282
SCAR-TM 1042.47 0.6201

SCAR-TM achieves the highest log-likelihood and the best GoF calibration. MLE is rejected at the 1% level; both dynamic models pass comfortably.

Available rotations: [0, 90, 180, 270]. Available methods: ['mle', 'scar-p-ou', 'scar-m-ou', 'scar-tm-ou', 'gas'].

3. Sample from copula

# Sample from constant-parameter copula
samples_mle = copula_mle.sample(n=1000, r=fit_result_mle.copula_param)

# Sample next state from stochastic copula
samples_tm = copula_tm.predict(n=1000)

4. Fit a multivariate C-vine copula

tickers = ['BTC-USD', 'ETH-USD', 'BNB-USD', 'ADA-USD', 'XRP-USD', 'DOGE-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:251]
returns = returns_pd.values
u = pobs(returns)

# C-vine with SCAR-TM (truncated for speed)
vine = CVineCopula()
vine.fit(u, method='scar-tm-ou',
         truncation_level=2, min_edge_logL=10)
vine.summary()

# Comparison models
copula_s = StudentCopula()
copula_g = GaussianCopula()
copula_s.fit(u)
copula_g.fit(u)

gof_result_vine = gof_test(vine, u, to_pobs=False)
gof_result_s = gof_test(copula_s, u, to_pobs=False)
gof_result_g = gof_test(copula_g, u, to_pobs=False)

Results on 6-dimensional crypto data (T = 250):

Model logL GoF p-value Fit time
C-vine SCAR-TM 921.9 0.8971 13.4s
C-vine MLE 869.2 0.2072 0.6s
Student-t 764.4 0.0001
Gaussian 761.0 0.0000

The C-vine with SCAR-TM substantially outperforms all alternatives. With truncation (truncation_level=2, min_edge_logL=10), only 9 of 15 edges use SCAR — the remaining 6 fall back to MLE — achieving a good speed/accuracy tradeoff.

5. Risk metrics (VaR / CVaR)

Risk metrics are provided in pyscarcopula.contrib — optional modules for rolling VaR/CVaR estimation, marginal distribution fitting, and portfolio optimization.

from pyscarcopula.contrib.risk_metrics import risk_metrics

d = len(tickers)
result = risk_metrics(
    vine, returns, 
    window_len=100,
    gamma=[0.95],
    N_mc=[100_000],
    marginals_method='johnsonsu',
    method='mle',
    optimize_portfolio=False,
    portfolio_weight=np.ones(d) / d,
    n_jobs=-1,  # parallel over rolling windows
)

var = result[0.95][100_000]['var']
cvar = result[0.95][100_000]['cvar']

See example.ipynb for a complete walkthrough with plots.

Performance tuning

The SCAR-TM-OU method has several parameters that control the speed/accuracy tradeoff. Here are recommended configurations:

Bivariate copula

copula = GumbelCopula(rotate=180)

# Default (analytical gradient + smart init, both enabled by default)
copula.fit(u, method='scar-tm-ou')

# Disable optimizations for debugging / backward compatibility
copula.fit(u, method='scar-tm-ou', analytical_grad=False, smart_init=False)

# Relaxed tolerance (faster, slight logL loss)
copula.fit(u, method='scar-tm-ou', tol=5e-2)
Parameter Default Effect
analytical_grad True Analytical gradient. ~3-4x fewer function evaluations.
smart_init True Heuristic initial point. Up to 5x speedup on long series.
tol 1e-2 Gradient tolerance. 5e-2 is ~2x faster with negligible logL loss.
K 300 Minimum grid size. The adaptive rule may increase it to ensure adequate resolution of the transition kernel.
pts_per_sigma 2 Grid density: number of points per conditional standard deviation $\sigma_c$. The adaptive rule computes $K_\text{eff} = \max(K,, \lceil 2R\sigma / (\sigma_c / \texttt{pts_per_sigma}) \rceil)$, where $R$ is the grid half-range in $\sigma$ units. Higher values improve quadrature accuracy at the cost of larger $K$ and longer runtime. The default of 2 is sufficient for most cases; increasing to 4 may help for very peaked transition kernels (large $\theta$, small $\nu$).

Vine copula

vine = CVineCopula()

# Recommended: skip SCAR on weak edges and upper trees
vine.fit(u, method='scar-tm-ou', truncation_level=2, min_edge_logL=10)
Parameter Default Effect
truncation_level None Trees ≥ this level stay MLE. Recommended: 2-3 for d > 10.
min_edge_logL None Edges with MLE logL below threshold stay MLE. Recommended: 5-10.

Truncated edges use MLE (constant parameter). The GoF test handles mixed SCAR/MLE models correctly.

d=6 crypto, T=250:

Configuration Time logL GoF p-value
truncation_level=2, min_edge_logL=10 ~13s ~922 ~0.90
Full SCAR (15 edges) ~30s ~891 ~0.90
MLE only ~0.6s ~869 ~0.21

License

MIT License. See LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyscarcopula-0.1.0.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyscarcopula-0.1.0-py3-none-any.whl (62.1 kB view details)

Uploaded Python 3

File details

Details for the file pyscarcopula-0.1.0.tar.gz.

File metadata

  • Download URL: pyscarcopula-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyscarcopula-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f2cc3e0157f2b788de183fad7026707fa63f4f93be08f3d40ac315f51905e313
MD5 cfd0e419f94f7f6d61b3cfbbcb3eac42
BLAKE2b-256 164393c7e275995ae2a8049225e9522cb119beedf43ecc2a850c01d16bc457e3

See more details on using hashes here.

File details

Details for the file pyscarcopula-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyscarcopula-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 62.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyscarcopula-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7400e45ee159efc632bf1f1e93caca74f028449dcacb3a16d20d410a7bad996
MD5 4a256c1e5b06b54aeeb27cc05676b571
BLAKE2b-256 caf38ec14d154d14543807884fcf7a4b223578e423e4b1b449816ed8d291d077

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page