Panel data econometrics in Python: Fixed Effects, Random Effects, GMM (Arellano-Bond, Blundell-Bond), Experiment Pattern with Result Containers (Validation, Comparison, Residual), Test Runners & Master Reports, Interactive Visualizations (35 Charts), Professional HTML Reports, Robust Standard Errors (HC, Clustered, Driscoll-Kraay, Newey-West), Comprehensive Diagnostics
Project description
PanelBox provides comprehensive tools for panel data econometrics, bringing Stata's xtabond2 and R's plm capabilities to Python with modern, user-friendly APIs.
Features
✅ Static Panel Models
- Pooled OLS: Standard OLS with panel data
- Fixed Effects: Control for time-invariant heterogeneity
- Random Effects: GLS estimation with random effects
- Hausman Test: Test for endogeneity of random effects
✅ Dynamic Panel GMM (v0.2.0)
- Difference GMM: Arellano-Bond (1991) estimator
- System GMM: Blundell-Bond (1998) estimator
- Robust to unbalanced panels: Smart instrument selection
- Windmeijer correction: Finite-sample standard error correction
- Comprehensive diagnostics:
- Hansen J-test for overidentification
- Sargan test
- Arellano-Bond AR tests
- Instrument ratio monitoring
🔧 Panel-Specific Features
- Unbalanced panel support: Handles missing observations gracefully
- Time effects: Time dummies, linear trends, or custom time controls
- Clustered standard errors: Robust inference
- Instrument generation: Automatic GMM-style and IV-style instruments
- Collapse option: Avoids instrument proliferation (Roodman 2009)
📊 Publication-Ready Output
- Summary tables: Professional regression output
- Diagnostic tests: Comprehensive specification testing
- LaTeX export: Ready for academic papers
- Warnings system: Guides users to correct specifications
Installation
pip install panelbox
Or install from source:
git clone https://github.com/PanelBox-Econometrics-Model/panelbox.git
cd panelbox
pip install -e .
Quick Start
🎯 Experiment Pattern (Recommended - v0.6.0+)
import panelbox as pb
import pandas as pd
# Load your panel data
data = pd.read_csv('panel_data.csv')
# Create experiment
experiment = pb.PanelExperiment(
data=data,
formula="invest ~ value + capital",
entity_col="firm",
time_col="year"
)
# Fit multiple models at once
experiment.fit_all_models(names=['pooled', 'fe', 're'])
# Validate model specification
validation_result = experiment.validate_model('fe')
print(validation_result.summary())
validation_result.save_html('validation_report.html', test_type='validation')
# Compare models and select best one
comparison_result = experiment.compare_models(['pooled', 'fe', 're'])
print(f"Best model: {comparison_result.best_model}")
comparison_result.save_html('comparison_report.html', test_type='comparison')
# Analyze residuals (v0.7.0)
residual_result = experiment.analyze_residuals('fe')
print(residual_result.summary())
# Check diagnostic tests
stat, pvalue = residual_result.shapiro_test
print(f"Shapiro-Wilk normality test: p={pvalue:.4f}")
dw = residual_result.durbin_watson
print(f"Durbin-Watson statistic: {dw:.4f}")
residual_result.save_html('residuals_report.html', test_type='residuals')
# Generate master report with all sub-reports (NEW in v0.8.0!)
experiment.save_master_report(
'master_report.html',
theme='professional',
reports=[
{'type': 'validation', 'title': 'Model Validation',
'description': 'Specification tests', 'file_path': 'validation_report.html'},
{'type': 'comparison', 'title': 'Model Comparison',
'description': 'Compare pooled, FE, RE', 'file_path': 'comparison_report.html'},
{'type': 'residuals', 'title': 'Residual Diagnostics',
'description': 'Diagnostic tests', 'file_path': 'residuals_report.html'}
]
)
Static Panel Models (Traditional API)
import panelbox as pb
import pandas as pd
# Load your panel data
data = pd.read_csv('panel_data.csv')
# Fixed Effects model
fe = pb.FixedEffects(
formula="invest ~ value + capital",
data=data,
entity_col="firm",
time_col="year"
)
results = fe.fit(cov_type='clustered')
print(results.summary())
# Hausman test
hausman = pb.HausmanTest(fe_results, re_results)
print(hausman)
Dynamic Panel GMM
from panelbox import DifferenceGMM
# Arellano-Bond employment equation
gmm = DifferenceGMM(
data=data,
dep_var='employment',
lags=1,
id_var='firm',
time_var='year',
exog_vars=['wages', 'capital', 'output'],
time_dummies=False,
collapse=True,
two_step=True,
robust=True
)
results = gmm.fit()
print(results.summary())
# Check specification tests
print(f"Hansen J p-value: {results.hansen_j.pvalue:.3f}")
print(f"AR(2) p-value: {results.ar2_test.pvalue:.3f}")
System GMM (Blundell-Bond)
from panelbox import SystemGMM
# System GMM for persistent series
sys_gmm = SystemGMM(
data=data,
dep_var='y',
lags=1,
id_var='id',
time_var='year',
exog_vars=['x1', 'x2'],
collapse=True,
two_step=True,
robust=True
)
results = sys_gmm.fit()
print(results.summary())
# Compare efficiency with Difference GMM
print(f"Instrument count: {results.n_instruments}")
print(f"Instrument ratio: {results.instrument_ratio:.3f}")
📖 Best Practices for GMM
Recommended: Use collapse=True
Following Roodman (2009), we strongly recommend using collapsed instruments:
# ✅ RECOMMENDED
gmm = DifferenceGMM(..., collapse=True)
Why collapse instruments?
- ✅ Better numerical stability - Avoids ill-conditioned matrices
- ✅ Reduces overfitting - Fewer instruments mean less overfitting risk
- ✅ Improves finite-sample properties - Better performance with limited data
- ✅ Grows as O(T) not O(T²) - Scales better with time periods
When you use collapse=False:
- ⚠️ You'll see a detailed warning message
- ⚠️ May encounter numerical instability warnings
- ⚠️ Works but requires careful interpretation
See examples/gmm/unbalanced_panel_guide.py for detailed guidance.
Reference: Roodman, D. (2009). "How to do xtabond2: An introduction to difference and system GMM in Stata." The Stata Journal, 9(1), 86-136.
Key Advantages
1. Handles Unbalanced Panels Gracefully
Unlike some implementations, PanelBox:
- ✅ Automatically detects unbalanced panel structure
- ✅ Warns about problematic specifications
- ✅ Intelligently selects instruments based on data availability
- ✅ Provides clear guidance when specifications fail
# Smart warnings for unbalanced panels
gmm = DifferenceGMM(data=unbalanced_data, ...)
# UserWarning: Unbalanced panel detected (20% balanced) with 8 time dummies.
# This may result in very few observations being retained.
#
# Recommendations:
# 1. Set time_dummies=False and add a linear trend
# 2. Use only subset of key time dummies
# 3. Ensure collapse=True
2. Comprehensive Specification Tests
All GMM models include:
- Hansen J-test: Overidentification test with interpretation
- Sargan test: Alternative overidentification test
- AR(1) and AR(2) tests: Serial correlation in first-differenced errors
- Instrument ratio: n_instruments / n_groups (should be < 1.0)
3. Follows Best Practices
Based on Roodman (2009) "How to do xtabond2":
- Collapse option to avoid instrument proliferation
- Windmeijer (2005) standard error correction
- Automatic lag selection based on data availability
- Clear warnings for problematic specifications
4. Rich Documentation
- 📚 Comprehensive tutorial
- 📖 Interpretation guide with decision tables
- 💡 Example scripts for common use cases
- 🔬 Unbalanced panel guide
Learning Resources
📚 Interactive Tutorials (NEW!)
We've created comprehensive Jupyter notebook tutorials to help you master panel data econometrics:
Getting Started Guide - Your roadmap to learning PanelBox
Module 1: Fundamentals (3.5-4.5 hours)
Perfect for beginners! Learn the core concepts:
- 01 - Introduction to Panel Data - Loading and transforming panel data
- 02 - Model Specification with Formulas - R-style formula syntax
- 03 - Estimation and Results Interpretation - Fitting models and understanding output
- 04 - Spatial Fundamentals - Creating spatial weight matrices
More modules coming soon:
- Module 2: Classical Estimators (Fixed Effects, Random Effects)
- Module 3: Dynamic GMM (Arellano-Bond)
- Module 4: Spatial Panel Models
See the tutorials directory for the complete learning path.
💡 Example Scripts
See the examples directory for:
- OLS vs FE vs GMM comparison: Demonstrating bias in each estimator
- Firm growth model: Intermediate example with error handling
- Production function estimation: Advanced example with simultaneity bias
- Unbalanced panel guide: Practical solutions for unbalanced data
Comparison with Other Packages
| Feature | PanelBox | linearmodels | pyfixest | statsmodels |
|---|---|---|---|---|
| Difference GMM | ✅ | ❌ | ❌ | ❌ |
| System GMM | ✅ | ❌ | ❌ | ❌ |
| Unbalanced panels | ✅ Smart | ⚠️ Basic | ⚠️ Basic | ⚠️ Basic |
| Collapse option | ✅ | ❌ | ❌ | ❌ |
| Windmeijer correction | ✅ | ❌ | ❌ | ❌ |
| User warnings | ✅ Proactive | ⚠️ Reactive | ⚠️ Reactive | ⚠️ Reactive |
| Documentation | ✅ Rich | ✅ Good | ✅ Good | ✅ Good |
Requirements
- Python >= 3.9
- NumPy >= 1.24.0
- Pandas >= 2.0.0
- SciPy >= 1.10.0
- statsmodels >= 0.14.0
- patsy >= 0.5.3
Validation
PanelBox has been validated against:
- ✅ Arellano-Bond (1991) employment equation
- ✅ Stata xtabond2 (with appropriate specifications)
- ✅ Multiple synthetic datasets with known DGP
See validation directory for details.
Citation
If you use PanelBox in your research, please cite:
@software{panelbox2026,
author = {Haase, Gustavo and Dourado, Paulo},
title = {PanelBox: Panel Data Econometrics in Python},
year = {2026},
version = {0.6.0},
url = {https://github.com/PanelBox-Econometrics-Model/panelbox}
}
References
Implemented Methods
-
Arellano, M., & Bond, S. (1991). "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations." Review of Economic Studies, 58(2), 277-297.
-
Blundell, R., & Bond, S. (1998). "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models." Journal of Econometrics, 87(1), 115-143.
-
Windmeijer, F. (2005). "A Finite Sample Correction for the Variance of Linear Efficient Two-step GMM Estimators." Journal of Econometrics, 126(1), 25-51.
-
Roodman, D. (2009). "How to do xtabond2: An Introduction to Difference and System GMM in Stata." Stata Journal, 9(1), 86-136.
Textbooks
- Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6th ed.). Springer.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- 📫 Issues: GitHub Issues
- 📖 Documentation: GitHub Wiki
- 💬 Discussions: GitHub Discussions
Changelog
See CHANGELOG.md for complete version history.
Latest Release: v0.6.0 (2026-02-25)
🎯 Complete Econometric Toolkit
- 70+ econometric models across 13 families
- 50+ diagnostic tests with comprehensive validation against R/Stata
- Interactive HTML reports with 35+ Plotly charts
- Complete documentation overhaul with 200+ pages
- Google Colab tutorials for all model families
Comprehensive Visualization System:
- ✨ 35+ interactive Plotly charts for panel data analysis
- ✨ 3 professional themes (Professional, Academic, Presentation)
- ✨ Interactive HTML reports with embedded charts
- ✨ Multiple export formats (HTML, JSON, PNG, SVG, PDF)
- ✨ High-level convenience APIs for common visualizations
Residual Diagnostics (NEW in v0.7.0):
- ✨ Shapiro-Wilk test - Test for normality of residuals
- ✨ Jarque-Bera test - Alternative normality test
- ✨ Durbin-Watson statistic - Autocorrelation detection
- ✨ Ljung-Box test - Serial correlation up to 10 lags
- ✨ Summary statistics (mean, std, skewness, kurtosis)
- ✨ Professional summary output with interpretation guidelines
Static Panel Models:
- ✨ Pooled OLS, Fixed Effects, Random Effects, Between, First Differences
- ✨ 8 types of robust standard errors (HC0-HC3, clustered, Driscoll-Kraay, Newey-West, PCSE)
- ✨ Comprehensive specification tests
Dynamic Panel GMM:
- ✨ Difference GMM (Arellano-Bond 1991)
- ✨ System GMM (Blundell-Bond 1998)
- ✨ Smart instrument selection for unbalanced panels
- ✨ Windmeijer finite-sample correction
Advanced Features:
- ✨ Bootstrap inference (4 methods: pairs, wild, block, residual)
- ✨ Sensitivity analysis (leave-one-out, subset stability)
- ✨ 20+ validation tests (unit root, cointegration, diagnostics)
- ✨ Professional report generation (HTML, Markdown, LaTeX)
Quality & Performance:
- 🔧 Complete result container trilogy (Validation, Comparison, Residual)
- 🔧 Zero console warnings
- 🔧 16 new tests for ResidualResult (85% coverage)
- 🔧 HTML reports with embedded interactive charts
- ✅ Production-ready package
Made with ❤️ for econometricians and researchers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file panelbox-0.6.0.tar.gz.
File metadata
- Download URL: panelbox-0.6.0.tar.gz
- Upload date:
- Size: 7.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.10 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1880114b6b2e80c76efa3c47663a1ea050d0ee1cd5a887f98bb45183f08d7318
|
|
| MD5 |
0a987206e65c94034acbf424cc66f459
|
|
| BLAKE2b-256 |
1a7042495af0a86962cba263569151dfc02c983da38f529a46d6fb67b90ffa1d
|
File details
Details for the file panelbox-0.6.0-py3-none-any.whl.
File metadata
- Download URL: panelbox-0.6.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.10 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0eab846f8c3f0b302bdd472a050c3178b5fe43b86b2787040b13e47922be37a9
|
|
| MD5 |
eac9deffecfc56e054752432d988a669
|
|
| BLAKE2b-256 |
acc0ac446871a926159938ca780e4a525ef64ee28936461f8b42729e2dbfc1f0
|