Goodness-of-fit diagnostics for state space models
Project description
statespacecheck
Goodness-of-fit diagnostics for state space models in neuroscience
statespacecheck provides tools to assess how well Bayesian state space models fit neural data by examining the consistency between posterior distributions and their component likelihood distributions. These diagnostics help identify issues with prior specification and model assumptions, enabling iterative model refinement.
Overview
State space models are powerful tools for relating neural activity to latent dynamic brain states (e.g., memory, attention, spatial navigation). The core assumption is that complex, high-dimensional neural activity can be related to low-dimensional latent states through:
- State transition model: How latent states evolve over time
- Observation model: How neural activity relates to the current latent state
The posterior distribution combines information from both models, weighing current data (normalized likelihood) against accumulated history (prediction distribution). When these distributions agree, the model's prior expectations and data-driven evidence are consistent. When they diverge, the mismatch reveals where and when the model fails to capture the structure of the data.
Features
- KL Divergence: Measure information divergence between posterior and likelihood distributions at each time point
- HPD Overlap: Compute spatial overlap between highest posterior density regions
- Vectorized Operations: Efficient NumPy-based implementation with no Python loops
- Flexible Dimensionality: Supports both 1D
(n_time, n_position_bins)and 2D(n_time, n_x_bins, n_y_bins)spatial arrays - Robust Edge Case Handling: Proper treatment of NaN values, zero sums, and empty distributions
Terminology
This package uses specific terminology to match standard state space model conventions:
State Distributions (state_dist parameter)
- One-step-ahead predictive distribution: p(x_t | y_{1:t-1}) - The distribution over current state given all past observations
- Smoothed distribution: p(x_t | y_{1:T}) - The distribution over state at time t given all observations (past and future)
- Filtered distribution: p(x_t | y_{1:t}) - The posterior distribution at time t (filtered estimate)
For goodness-of-fit diagnostics, you typically use the one-step predictive or smoothed distribution as state_dist. These represent your model's predictions before (predictive) or after (smoother) incorporating all available data.
Likelihood (likelihood parameter)
- Normalized likelihood: p(y_t | x_t) / Σ_x p(y_t | x_t) - The likelihood normalized across spatial positions
- This is mathematically equivalent to the posterior p(x_t | y_t) with a uniform prior
- Represents what your data alone says about the state, without temporal smoothing
Important Note: Discrete Distributions
All functions expect discrete probability distributions represented as histograms over spatial bins. For continuous distributions (e.g., Gaussian), discretize them first:
- Distributions are automatically normalized over valid (non-NaN) bins
- Each bin represents the probability mass in that spatial region
- NaN values can be used to mark invalid/inaccessible spatial bins (e.g., walls in a maze)
- Finer binning provides better approximation but increases computation
Interpretation
- Consistency: When state distribution and likelihood agree (low KL divergence, high overlap), your model's predictions align with the data
- Inconsistency: When they diverge, it indicates:
- Prior/transition model may be too rigid or misspecified
- Observation model may not capture the true relationship between states and observations
- Model capacity may be insufficient
Installation
# Using uv (recommended)
uv pip install -e .
# Using pip
pip install -e .
Quick Start
Basic Example
import numpy as np
from statespacecheck import (
kl_divergence,
hpd_overlap,
highest_density_region,
)
# Example: 1D spatial arrays (time x position)
n_time, n_bins = 100, 50
state_dist = np.random.dirichlet(np.ones(n_bins), size=n_time) # predictive or smoother
likelihood = np.random.dirichlet(np.ones(n_bins), size=n_time)
# Compute KL divergence at each time point
kl_div = kl_divergence(state_dist, likelihood)
# Returns: (n_time,) array of divergence values
# Compute HPD region overlap
overlap = hpd_overlap(state_dist, likelihood, coverage=0.95)
# Returns: (n_time,) array of overlap proportions (0 = no overlap, 1 = complete)
# Get highest density region mask
hd_mask = highest_density_region(state_dist, coverage=0.95)
# Returns: (n_time, n_bins) boolean mask
Neuroscience Example
import numpy as np
from scipy.stats import norm
from statespacecheck import kl_divergence, hpd_overlap
# Assume you have state space model output for spatial navigation task
# with position bins representing locations in a linear track
# Position bins (e.g., 50 cm track discretized into 100 bins)
position_bins = np.linspace(0, 50, 100) # cm
n_time = 1000 # Number of time steps
# Example: One-step-ahead predictive distribution from Kalman filter
# predicted_position: (n_time,) array of predicted positions in cm
# predicted_std: (n_time,) array of prediction uncertainty
predicted_position = 25 + 10 * np.sin(np.linspace(0, 4*np.pi, n_time))
predicted_std = np.ones(n_time) * 2.0
# Convert to spatial probability distribution over position bins
# Note: Distributions are automatically normalized, no need to normalize manually
state_dist = np.array([
norm.pdf(position_bins, loc=pred_pos, scale=pred_std)
for pred_pos, pred_std in zip(predicted_position, predicted_std)
])
# Example: Likelihood from place cell firing (observation model)
# spike_counts: (n_cells, n_time) array of spike counts
# place_fields: (n_cells, n_bins) array of firing rate maps
# For this example, we'll simulate the likelihood
# Note: Automatically normalized, no manual normalization needed
likelihood = np.array([
norm.pdf(position_bins, loc=pred_pos + np.random.randn(), scale=3.0)
for pred_pos in predicted_position
])
# Assess goodness-of-fit
divergence = kl_divergence(state_dist, likelihood)
overlap = hpd_overlap(state_dist, likelihood, coverage=0.95)
# Interpret results
print(f"Mean KL divergence: {np.mean(divergence):.3f}")
print(f"Mean HPD overlap: {np.mean(overlap):.3f}")
# Identify time points with poor fit
high_divergence = divergence > 1.0
low_overlap = overlap < 0.3
print(f"Time points with high divergence: {np.sum(high_divergence)}/{n_time}")
print(f"Time points with low overlap: {np.sum(low_overlap)}/{n_time}")
API Reference
kl_divergence(state_dist, likelihood)
Compute Kullback-Leibler divergence between state distribution and likelihood.
Parameters:
state_dist(np.ndarray): State distributions (one-step predictive or smoother). Non-negative values, automatically normalized. NaN marks invalid bins. Shape(n_time, ...)where...represents arbitrary spatial dimensionslikelihood(np.ndarray): Likelihood distributions. Non-negative values, automatically normalized. NaN marks invalid bins. Must have same shape as state_dist
Returns:
kl_divergence(np.ndarray): KL divergence at each time point. Shape(n_time,)
Interpretation:
- Low divergence (< 0.1): State distribution and likelihood agree well, indicating consistency between prior and data
- Moderate divergence (0.1 - 1.0): Some disagreement, worth investigating
- High divergence (> 1.0): Substantial mismatch, suggests issues with prior specification or observation model
hpd_overlap(state_dist, likelihood, coverage=0.95)
Compute overlap between highest posterior density regions.
Parameters:
state_dist(np.ndarray): State distributions (one-step predictive or smoother). Non-negative values, automatically normalized. NaN marks invalid bins. Shape(n_time, ...)where...represents arbitrary spatial dimensionslikelihood(np.ndarray): Likelihood distributions. Non-negative values, automatically normalized. NaN marks invalid bins. Must have same shape as state_distcoverage(float): Coverage probability for HPD regions (default: 0.95)
Returns:
overlap(np.ndarray): Overlap proportion at each time point. Shape(n_time,). Values range from 0 (no overlap) to 1 (complete overlap)
Interpretation:
- High overlap (> 0.7): State distribution and likelihood concentrate probability mass in similar regions
- Moderate overlap (0.3 - 0.7): Partial agreement, may indicate transition periods or model uncertainty
- Low overlap (< 0.3): Distributions are spatially inconsistent, suggests model issues
highest_density_region(distribution, coverage=0.95)
Compute boolean mask indicating highest density region membership.
Parameters:
distribution(np.ndarray): Probability distributions. Shape(n_time, ...)where...represents arbitrary spatial dimensionscoverage(float): Desired coverage probability (default: 0.95)
Returns:
isin_hd(np.ndarray): Boolean mask. Same shape as input
Notes:
- Highest density regions can be multimodal (non-contiguous)
- Regions are defined by selecting positions with highest density until cumulative mass reaches coverage
- NaN values are treated as zero mass
Development
Setup
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"
Running Tests
# Run all tests with coverage
pytest tests/ -v
# Run specific test file
pytest tests/test_posterior_consistency.py -v
# Run with coverage report
pytest tests/ --cov=src/statespacecheck --cov-report=html
Code Quality
# Check code style
ruff check .
# Format code
ruff format .
# Type checking
mypy src/
Standards
- Python: 3.10+ (following SPEC 0)
- Dependencies: numpy>=1.26.0, scipy>=1.11.0, matplotlib>=3.8.0
- Docstrings: NumPy format with parameter types and return values
- Type hints: Full mypy strict mode compliance
- Style: ruff for formatting and linting (100 char line length)
- No
# type: ignore: Fix type issues by refactoring, not suppressing
Scientific Context
This package implements goodness-of-fit diagnostics for state space models used in neuroscience. The methods are based on the principle that a well-specified model should have consistent posterior and likelihood distributions. Large divergences or low overlap indicate:
- Prior issues: State transition model too rigid or misspecified
- Observation model issues: Tuning curves or noise assumptions incorrect
- Model capacity: Latent state dimensionality insufficient
These diagnostics complement but are distinct from:
- Cross-validation: Measures predictive generalization to new data
- Permutation tests: Assess whether model captures structure vs. random patterns
Citation
If you use this package in your research, please cite:
@software{statespacecheck2025,
title={statespacecheck: Goodness-of-fit diagnostics for state space models},
author={Denovellis, Eric and Zeng, Sirui and Eden, Uri T.},
year={2025},
version={0.1.0},
url={https://github.com/edeno/statespacecheck},
doi={10.5281/zenodo.XXXXXXX}
}
Note: A DOI will be assigned when the package is published to Zenodo.
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass and code meets quality standards
- Submit a pull request
References
- Auger-Méthé, M., et al. (2021). A guide to state-space modeling of ecological time series. Ecological Monographs, 91(4), e01470.
- Newman, K. B., & Thomas, L. (2014). Goodness of fit for state-space models. In Statistical Inference from Stochastic Processes (pp. 153-191).
- Gelman, A., et al. (2020). Bayesian Data Analysis (3rd ed.). CRC Press.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file statespacecheck-0.1.1.tar.gz.
File metadata
- Download URL: statespacecheck-0.1.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72febb66639223070ffa0833d6c054cb85bb5b896fd8a35f2cfed65c0f5adc88
|
|
| MD5 |
bece448bb611df953067a4a9fc79b132
|
|
| BLAKE2b-256 |
16e89445fae56893df562c9bba0d834fbfa1a79a7dcdd4487e1946b7fc3c8269
|
Provenance
The following attestation bundles were made for statespacecheck-0.1.1.tar.gz:
Publisher:
ci.yml on edeno/statespacecheck
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statespacecheck-0.1.1.tar.gz -
Subject digest:
72febb66639223070ffa0833d6c054cb85bb5b896fd8a35f2cfed65c0f5adc88 - Sigstore transparency entry: 708696353
- Sigstore integration time:
-
Permalink:
edeno/statespacecheck@386811228839cdfdb8fccabb598f9247ff6b7633 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/edeno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@386811228839cdfdb8fccabb598f9247ff6b7633 -
Trigger Event:
push
-
Statement type:
File details
Details for the file statespacecheck-0.1.1-py3-none-any.whl.
File metadata
- Download URL: statespacecheck-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92138cfcdc60b27ec36e00473742caeeff6962de7502dcda73b0b36fcaa257d2
|
|
| MD5 |
98eb728e6c725f14dbf870b3b3b6bb7d
|
|
| BLAKE2b-256 |
5c341e3c871310a654335a397ecc91804479cba2b93b487ffd25465905759c8c
|
Provenance
The following attestation bundles were made for statespacecheck-0.1.1-py3-none-any.whl:
Publisher:
ci.yml on edeno/statespacecheck
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statespacecheck-0.1.1-py3-none-any.whl -
Subject digest:
92138cfcdc60b27ec36e00473742caeeff6962de7502dcda73b0b36fcaa257d2 - Sigstore transparency entry: 708696359
- Sigstore integration time:
-
Permalink:
edeno/statespacecheck@386811228839cdfdb8fccabb598f9247ff6b7633 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/edeno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@386811228839cdfdb8fccabb598f9247ff6b7633 -
Trigger Event:
push
-
Statement type: