Explore your panel data interactively — a Python port of the ExPanDaR R package.
Project description
expdpy is a Python library for interactive analysis of panel and cross-sectional data,
organized around three modules — Explore, Analyze and Learn. It pairs composable
functions — that return interactive Plotly figures and
publication-quality Great Tables — with
fixest-style econometrics, a built-in teaching layer that explains and interprets every
result, and three no-code ExPdPy apps (one per module). It is built for beginners and
applied researchers alike.
It is built on the modern Python data and econometrics stack:
- Plotly — interactive figures
- pyfixest — fixed-effects and difference-in-differences estimators
- Great Tables — publication-quality tables
- linearmodels — random effects, between, correlated random effects, and the Hausman test
- Streamlit — the no-code
ExPdPyapps
Features
Explore panel data
Descriptive, correlation and extreme-observation tables; histograms and category bar charts;
time trends and quantile trends; by-group bar, violin and trend views; scatter plots with an
optional LOESS smoother; a missing-value heatmap across the panel; and outlier treatment with
treat_outliers. Each function takes a pandas DataFrame and returns an interactive Plotly
figure or a Great Tables object you can drop straight into a notebook or report.
A dedicated set of panel-aware views makes the cross-unit-vs-over-time structure explicit:
the within/between variation table explore_xtsum_table (Stata xtsum-style) and the
within-vs-between scatter explore_scatter_plot_within_between; per-unit trajectories
(explore_spaghetti_plot); panel-structure diagnostics — a balance/gaps summary and
unit-by-period presence grid (explore_panel_structure) plus a unit-by-time value heatmap
(explore_value_heatmap); and distribution & transition dynamics — explore_distribution_over_time
(ridgeline or animated), explore_transition_matrix, and within-unit serial-correlation via
explore_within_persistence. Panel functions take an entity (unit) and a time id; declare
them once with set_panel(df, entity=..., time=...) and the rest of Explore can omit them.
Analyze panel data
OLS with multi-way fixed effects and clustered standard errors via
pyfixest, plus a richer analyze_estimation
adding stepwise / multiple-outcome comparison, serial-correlation-robust standard errors
(Newey–West, Driscoll–Kraay) and weights. Estimate pooled / between / fixed / random
effects with analyze_panel_table, bring within estimates into a random-effects frame with
the correlated-random-effects (Mundlak) estimator analyze_cre_table, and choose between
fixed and random effects with the Hausman test. Round it out with post-estimation tools
(fixed-effect plots, predictions, Wald joint tests), robust inference (randomization
inference and the wild cluster bootstrap), Frisch–Waugh–Lovell and coefficient plots,
and modern event-study / staggered difference-in-differences estimators (Gardner's did2s,
Sun–Abraham, local-projections DiD, dynamic TWFE) with a built-in pre-trend diagnostic and a
treatment-structure analyze_panel_view. For growth dynamics, analyze_beta_convergence runs
the standard β-convergence workflow — unconditional and conditional (Frisch–Waugh–Lovell)
convergence with annotated scatters, the speed of convergence and half-life, and a rolling
fixed-window view. Its distributional counterpart analyze_sigma_convergence tracks
cross-sectional dispersion over time — standard deviation, Gini index and coefficient of
variation on a dual-axis chart — and tests whether the distribution narrows (σ-convergence).
When a panel is plausibly not one homogeneous group, analyze_convergence_clubs runs the
Phillips–Sul log(t) workflow — HP-filter trend, relative transition paths, the log(t)
convergence test and a data-driven clustering algorithm — to split units into convergence
clubs (a faithful port of the Stata psecta package).
Learn panel data
Every result speaks plain language. .interpret() gives an associational reading of the
output (never a causal claim unless the design supports it); .explain(), together with
explain(topic) and list_topics(), provides concept explainers for fixed effects,
clustering, random effects, the Mundlak device, first differences, demeaning, dummy variables,
event studies, omitted-variable bias and more. Result objects also expose broom-style
.tidy() / .glance(). Concept sandboxes simulate data so a learner can see and tune a
concept — learn_omitted_variable_bias, learn_pooled_vs_fixed_effects,
learn_clustering_se, learn_first_differences, and learn_within_vs_lsdv (which shows
first differences ≈ demeaning ≈ least-squares dummy variables).
Three no-code apps — Streamlit
The whole workflow without writing code, in three apps — Explore, Analyze and Learn — that share a sidebar sample pipeline (subset filters, outlier treatment, user-defined variables) and differ only in which pages they expose. The apps deploy to Streamlit Community Cloud in one click.
Reproducibility & safety
Any in-app exploration exports to a runnable bundle — a Jupyter notebook, a .py script
and the prepared data (parquet) — that recreates every displayed result with expdpy calls.
Analysis configurations save and load as JSON. New variables can be defined live through a
restricted-AST expression evaluator (never eval/exec) with panel-aware lag/lead
that shift within each cross-section.
Bundled datasets
expdpy.data ships ready-to-explore panels — kuznets (the flagship N-shaped
Kuznets-curve demo), gapminder, staggered_did (a synthetic staggered-adoption panel
for the event-study / DiD tools), firms (a small unbalanced panel — staggered
entry/exit, interior gaps, a discrete size class and persistent revenue — for the
panel-structure, transition and persistence views), productivity (a balanced
108-country × 25-year Penn World Table panel of log GDP per capita and log labor productivity
for the club-convergence workflow), and bolivia112_gdppc (a real-world balanced
112-province × 35-year Bolivian panel of GDP per capita and its log, 1990–2024, for the
convergence workflows and subnational exploration). See the
kuznets dataset page for the
data dictionary.
Installation
Install the latest release from PyPI (random effects, CRE and the Hausman test work out of the
box; the apps need the streamlit extra):
pip install expdpy
pip install "expdpy[streamlit]" # the no-code ExPdPy apps (Streamlit)
Using uv:
uv pip install expdpy
uv pip install "expdpy[streamlit]"
Development version (latest from GitHub)
For the most up-to-date, unreleased version, install straight from the main branch:
pip install "git+https://github.com/cmg777/expdpy.git"
pip install "expdpy[streamlit] @ git+https://github.com/cmg777/expdpy.git"
Pin to a release, branch, or commit for reproducible installs:
pip install "expdpy==0.4.17"
pip install "git+https://github.com/cmg777/expdpy.git@v0.4.17"
pip install "git+https://github.com/cmg777/expdpy.git@main"
Requires Python 3.10+.
Try it in Colab — no install
Every page of the docs (and the per-function guides) carries a one-click Open in Colab badge. The notebook's first cell installs expdpy and then restarts the runtime once so the upgraded NumPy loads cleanly — when it reconnects, just run the cells again (Runtime ▸ Run all).
Upgrading from 0.4.1? In 0.4.2 every analysis function gained a module prefix:
prepare_*→explore_*/analyze_*andsandbox_*→learn_*, with figures ending in_plot, tables in_table, and scope qualifiers moved to the end (e.g.prepare_by_group_violin_graph→explore_violin_plot_by_group). The utilitiesset_panel,resolve_panel,treat_outliers,explainandlist_topicskeep their names. See the changelog for the full rename map.
At a glance
The lead example throughout the docs is the bundled kuznets panel (80 countries ×
2015–2025): a synthetic dataset whose regional inequality traces an N-shaped Kuznets curve
in income — it rises, falls, then rises again at very high income.
import expdpy as ex
from expdpy.data import load_kuznets
df = load_kuznets()
# The N-shaped regional Kuznets curve: regional inequality vs (log) GDP per capita
ex.explore_scatter_plot(
df, x="log_gdp_pc", y="gini_regional", color="continent", size="population", loess=1
).fig
Explore the panel structure. Declare the panel once, then split a variable's variation into across-unit (between) and over-time (within) parts, or trace every unit at once:
df = ex.set_panel(load_kuznets(), entity="country", time="year")
ex.explore_xtsum_table(df, var=["gini_regional", "log_gdp_pc"]).gt # within/between table
ex.explore_spaghetti_plot(df, var="gini_regional").fig # one line per country
ex.explore_scatter_plot_within_between(df, x="log_gdp_pc", y="gini_regional").fig
Run a regression and let it explain itself. Two-way fixed effects, clustered standard errors, a plain-language reading, and a coefficient plot:
res = ex.analyze_regression_table(
df,
dvs="gini_regional",
idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
feffects=["country", "year"],
clusters=["country"],
)
print(res.interpret()) # plain-language, associational reading
ex.analyze_coefficient_plot(res) # themed coefficient plot with confidence intervals
Bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator — its slope equals the fixed-effects estimate, and a joint test on the unit-mean terms is the regression-form Hausman test:
ex.analyze_cre_table(
df, dv="gini_regional", idvs=["log_gdp_pc"], entity="country", time="year"
).etable
Event study & staggered difference-in-differences on the bundled treated panel:
from expdpy.data import load_staggered_did
did = load_staggered_did()
ex.analyze_panel_view(did, unit="unit", time="year", cohort="cohort") # treatment structure
ex.analyze_event_study( # dynamic effects
did, outcome="outcome", unit="unit", time="year", cohort="cohort", estimator="did2s"
).fig
Classic panel models and the Hausman test:
ex.analyze_panel_table(did, dv="outcome", idvs=["treated"], entity="unit", time="year").etable
print(ex.analyze_hausman_test(did, dv="outcome", idvs=["treated"], entity="unit", time="year").interpret())
Learn as you go — concept sandboxes and explainers:
ex.learn_first_differences() # first differences ≈ demeaning ≈ dummy variables
print(ex.explain("fixed_effects")) # a concept explainer; ex.list_topics() lists them all
Launch the Explore app on this data, pre-configured to open on the curve:
from expdpy.streamlit_app import ExploreApp
from expdpy.data import load_kuznets, load_kuznets_data_def, get_config
ExploreApp(load_kuznets(), df_def=load_kuznets_data_def(), config_list=get_config("kuznets"))
Head to Explore, Analyze and Learn to see every function in action, the kuznets dataset page for the data dictionary, or the app guide to launch the interactive apps.
Documentation
Full documentation, tutorials, and the API reference live at https://cmg777.github.io/expdpy/.
Acknowledgements
expdpy began as a Python port of the excellent
ExPanDaR package by Joachim Gassen and the
TRR 266 Accounting for Transparency project, and its foundations remain deeply inspired by
that work. Over time it has grown well beyond the original — three no-code Streamlit apps;
fixest-style estimators (fixed effects, event study and staggered difference-in-differences)
with coefficient and Frisch–Waugh–Lovell plots; random-effects, correlated-random-effects and
Hausman panel models; a built-in pedagogy layer that interprets and explains results; a
restricted-AST expression evaluator with panel-aware lag/lead; and reproducible notebook /
script / data export — and it will keep evolving. We are grateful to the ExPanDaR authors;
please cite the original work when using expdpy in research (see
CITATION.cff).
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file expdpy-0.4.17.tar.gz.
File metadata
- Download URL: expdpy-0.4.17.tar.gz
- Upload date:
- Size: 547.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1503d03a92928e8efc9718efd6c979f25aaebd961f3ab865de7fb3c21603392
|
|
| MD5 |
9f144c812b3fa9f8ffd5130020dc82e0
|
|
| BLAKE2b-256 |
8a83c1a243f402a3eb00a785a561557440de2e4d52c91b6ae9002f1da4ea5748
|
Provenance
The following attestation bundles were made for expdpy-0.4.17.tar.gz:
Publisher:
release.yml on cmg777/expdpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
expdpy-0.4.17.tar.gz -
Subject digest:
f1503d03a92928e8efc9718efd6c979f25aaebd961f3ab865de7fb3c21603392 - Sigstore transparency entry: 1947625070
- Sigstore integration time:
-
Permalink:
cmg777/expdpy@5611976cb30bc7666ca58893b699849819e06cd7 -
Branch / Tag:
refs/tags/v0.4.17 - Owner: https://github.com/cmg777
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5611976cb30bc7666ca58893b699849819e06cd7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file expdpy-0.4.17-py3-none-any.whl.
File metadata
- Download URL: expdpy-0.4.17-py3-none-any.whl
- Upload date:
- Size: 544.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2a494452317ce25c0ce7eb796742c24e0a13473eab45f1d7d1bc4d0843d248f
|
|
| MD5 |
5fc7c7920994e27d5a4e69fd334a5a68
|
|
| BLAKE2b-256 |
2c74f083c0110a7c57472d9a3f99c0769899d8d101914769327b5cb45bae2e22
|
Provenance
The following attestation bundles were made for expdpy-0.4.17-py3-none-any.whl:
Publisher:
release.yml on cmg777/expdpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
expdpy-0.4.17-py3-none-any.whl -
Subject digest:
f2a494452317ce25c0ce7eb796742c24e0a13473eab45f1d7d1bc4d0843d248f - Sigstore transparency entry: 1947625268
- Sigstore integration time:
-
Permalink:
cmg777/expdpy@5611976cb30bc7666ca58893b699849819e06cd7 -
Branch / Tag:
refs/tags/v0.4.17 - Owner: https://github.com/cmg777
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5611976cb30bc7666ca58893b699849819e06cd7 -
Trigger Event:
push
-
Statement type: