Skip to main content

Explore your panel data interactively — a Python port of the ExPanDaR R package.

Project description

expdpy — A Python library to explore panel data interactively

CI codecov Docs Open In Colab PyPI Python License: MIT Ruff

expdpy is a Python library for interactive analysis of panel and cross-sectional data, organized around three modules — Explore, Analyze and Learn. It pairs composable functions — that return interactive Plotly figures and publication-quality Great Tables — with fixest-style econometrics, a built-in teaching layer that explains and interprets every result, and three no-code ExPdPy apps (one per module). It is built for beginners and applied researchers alike.

It is built on the modern Python data and econometrics stack:

  • Plotly — interactive figures
  • pyfixest — fixed-effects and difference-in-differences estimators
  • Great Tables — publication-quality tables
  • linearmodels — random effects, between, correlated random effects, and the Hausman test
  • Streamlit — the no-code ExPdPy apps

Features

Explore panel data

Descriptive, correlation and extreme-observation tables; histograms and category bar charts; time trends and quantile trends; by-group bar, violin and trend views; scatter plots with an optional LOESS smoother; a missing-value heatmap across the panel; and outlier treatment with treat_outliers. Each function takes a pandas DataFrame and returns an interactive Plotly figure or a Great Tables object you can drop straight into a notebook or report.

A dedicated set of panel-aware views makes the cross-unit-vs-over-time structure explicit: the within/between variation table explore_xtsum_table (Stata xtsum-style) and the within-vs-between scatter explore_scatter_plot_within_between; per-unit trajectories (explore_spaghetti_plot); panel-structure diagnostics — a balance/gaps summary and unit-by-period presence grid (explore_panel_structure) plus a unit-by-time value heatmap (explore_value_heatmap); and distribution & transition dynamicsexplore_distribution_over_time (ridgeline or animated), explore_transition_matrix, and within-unit serial-correlation via explore_within_persistence. Panel functions take an entity (unit) and a time id; declare them once with set_panel(df, entity=..., time=...) and the rest of Explore can omit them.

Analyze panel data

OLS with multi-way fixed effects and clustered standard errors via pyfixest, plus a richer analyze_estimation adding stepwise / multiple-outcome comparison, serial-correlation-robust standard errors (Newey–West, Driscoll–Kraay) and weights. Estimate pooled / between / fixed / random effects with analyze_panel_table, bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator analyze_cre_table, and choose between fixed and random effects with the Hausman test. Round it out with post-estimation tools (fixed-effect plots, predictions, Wald joint tests), robust inference (randomization inference and the wild cluster bootstrap), Frisch–Waugh–Lovell and coefficient plots, and modern event-study / staggered difference-in-differences estimators (Gardner's did2s, Sun–Abraham, local-projections DiD, dynamic TWFE) with a built-in pre-trend diagnostic and a treatment-structure analyze_panel_view. For growth dynamics, analyze_beta_convergence runs the standard β-convergence workflow — unconditional and conditional (Frisch–Waugh–Lovell) convergence with annotated scatters, the speed of convergence and half-life, and a rolling fixed-window view. Its distributional counterpart analyze_sigma_convergence tracks cross-sectional dispersion over time — standard deviation, Gini index and coefficient of variation on a dual-axis chart — and tests whether the distribution narrows (σ-convergence). When a panel is plausibly not one homogeneous group, analyze_convergence_clubs runs the Phillips–Sul log(t) workflow — HP-filter trend, relative transition paths, the log(t) convergence test and a data-driven clustering algorithm — to split units into convergence clubs (a faithful port of the Stata psecta package).

Learn panel data

Every result speaks plain language. .interpret() gives an associational reading of the output (never a causal claim unless the design supports it); .explain(), together with explain(topic) and list_topics(), provides concept explainers for fixed effects, clustering, random effects, the Mundlak device, first differences, demeaning, dummy variables, event studies, omitted-variable bias and more. Result objects also expose broom-style .tidy() / .glance(). Concept sandboxes simulate data so a learner can see and tune a concept — learn_omitted_variable_bias, learn_pooled_vs_fixed_effects, learn_clustering_se, learn_first_differences, and learn_within_vs_lsdv (which shows first differences ≈ demeaning ≈ least-squares dummy variables).

Three no-code apps — Streamlit

The whole workflow without writing code, in three apps — Explore, Analyze and Learn — that share a sidebar sample pipeline (subset filters, outlier treatment, user-defined variables) and differ only in which pages they expose. The apps deploy to Streamlit Community Cloud in one click.

Reproducibility & safety

Any in-app exploration exports to a runnable bundle — a Jupyter notebook, a .py script and the prepared data (parquet) — that recreates every displayed result with expdpy calls. Analysis configurations save and load as JSON. New variables can be defined live through a restricted-AST expression evaluator (never eval/exec) with panel-aware lag/lead that shift within each cross-section.

Bundled datasets

expdpy.data ships ready-to-explore panels — kuznets (the flagship N-shaped Kuznets-curve demo), gapminder, staggered_did (a synthetic staggered-adoption panel for the event-study / DiD tools), firms (a small unbalanced panel — staggered entry/exit, interior gaps, a discrete size class and persistent revenue — for the panel-structure, transition and persistence views), productivity (a balanced 108-country × 25-year Penn World Table panel of log GDP per capita and log labor productivity for the club-convergence workflow), and bolivia112_gdppc (a real-world balanced 112-province × 35-year Bolivian panel of GDP per capita and its log, 1990–2024, for the convergence workflows and subnational exploration). See the kuznets dataset page for the data dictionary.

Installation

Install the latest release from PyPI (random effects, CRE and the Hausman test work out of the box; the apps need the streamlit extra):

pip install expdpy
pip install "expdpy[streamlit]"   # the no-code ExPdPy apps (Streamlit)

Using uv:

uv pip install expdpy
uv pip install "expdpy[streamlit]"

Development version (latest from GitHub)

For the most up-to-date, unreleased version, install straight from the main branch:

pip install "git+https://github.com/cmg777/expdpy.git"
pip install "expdpy[streamlit] @ git+https://github.com/cmg777/expdpy.git"

Pin to a release, branch, or commit for reproducible installs:

pip install "expdpy==0.4.13"
pip install "git+https://github.com/cmg777/expdpy.git@v0.4.13"
pip install "git+https://github.com/cmg777/expdpy.git@main"

Requires Python 3.10+.

Try it in Colab — no install

Every page of the docs (and the per-function guides) carries a one-click Open in Colab badge. The notebook's first cell installs expdpy and then restarts the runtime once so the upgraded NumPy loads cleanly — when it reconnects, just run the cells again (Runtime ▸ Run all).

Upgrading from 0.4.1? In 0.4.2 every analysis function gained a module prefix: prepare_*explore_* / analyze_* and sandbox_*learn_*, with figures ending in _plot, tables in _table, and scope qualifiers moved to the end (e.g. prepare_by_group_violin_graphexplore_violin_plot_by_group). The utilities set_panel, resolve_panel, treat_outliers, explain and list_topics keep their names. See the changelog for the full rename map.

At a glance

The lead example throughout the docs is the bundled kuznets panel (80 countries × 2015–2025): a synthetic dataset whose regional inequality traces an N-shaped Kuznets curve in income — it rises, falls, then rises again at very high income.

import expdpy as ex
from expdpy.data import load_kuznets

df = load_kuznets()
# The N-shaped regional Kuznets curve: regional inequality vs (log) GDP per capita
ex.explore_scatter_plot(
    df, x="log_gdp_pc", y="gini_regional", color="continent", size="population", loess=1
).fig

Explore the panel structure. Declare the panel once, then split a variable's variation into across-unit (between) and over-time (within) parts, or trace every unit at once:

df = ex.set_panel(load_kuznets(), entity="country", time="year")

ex.explore_xtsum_table(df, var=["gini_regional", "log_gdp_pc"]).gt   # within/between table
ex.explore_spaghetti_plot(df, var="gini_regional").fig              # one line per country
ex.explore_scatter_plot_within_between(df, x="log_gdp_pc", y="gini_regional").fig

Run a regression and let it explain itself. Two-way fixed effects, clustered standard errors, a plain-language reading, and a coefficient plot:

res = ex.analyze_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
)
print(res.interpret())            # plain-language, associational reading
ex.analyze_coefficient_plot(res)  # themed coefficient plot with confidence intervals

Bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator — its slope equals the fixed-effects estimate, and a joint test on the unit-mean terms is the regression-form Hausman test:

ex.analyze_cre_table(
    df, dv="gini_regional", idvs=["log_gdp_pc"], entity="country", time="year"
).etable

Event study & staggered difference-in-differences on the bundled treated panel:

from expdpy.data import load_staggered_did

did = load_staggered_did()
ex.analyze_panel_view(did, unit="unit", time="year", cohort="cohort")   # treatment structure
ex.analyze_event_study(                                                  # dynamic effects
    did, outcome="outcome", unit="unit", time="year", cohort="cohort", estimator="did2s"
).fig

Classic panel models and the Hausman test:

ex.analyze_panel_table(did, dv="outcome", idvs=["treated"], entity="unit", time="year").etable
print(ex.analyze_hausman_test(did, dv="outcome", idvs=["treated"], entity="unit", time="year").interpret())

Learn as you go — concept sandboxes and explainers:

ex.learn_first_differences()      # first differences ≈ demeaning ≈ dummy variables
print(ex.explain("fixed_effects"))  # a concept explainer; ex.list_topics() lists them all

Launch the Explore app on this data, pre-configured to open on the curve:

from expdpy.streamlit_app import ExploreApp
from expdpy.data import load_kuznets, load_kuznets_data_def, get_config

ExploreApp(load_kuznets(), df_def=load_kuznets_data_def(), config_list=get_config("kuznets"))

Head to Explore, Analyze and Learn to see every function in action, the kuznets dataset page for the data dictionary, or the app guide to launch the interactive apps.

Documentation

Full documentation, tutorials, and the API reference live at https://cmg777.github.io/expdpy/.

Acknowledgements

expdpy began as a Python port of the excellent ExPanDaR package by Joachim Gassen and the TRR 266 Accounting for Transparency project, and its foundations remain deeply inspired by that work. Over time it has grown well beyond the original — three no-code Streamlit apps; fixest-style estimators (fixed effects, event study and staggered difference-in-differences) with coefficient and Frisch–Waugh–Lovell plots; random-effects, correlated-random-effects and Hausman panel models; a built-in pedagogy layer that interprets and explains results; a restricted-AST expression evaluator with panel-aware lag/lead; and reproducible notebook / script / data export — and it will keep evolving. We are grateful to the ExPanDaR authors; please cite the original work when using expdpy in research (see CITATION.cff).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expdpy-0.4.13.tar.gz (545.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

expdpy-0.4.13-py3-none-any.whl (542.2 kB view details)

Uploaded Python 3

File details

Details for the file expdpy-0.4.13.tar.gz.

File metadata

  • Download URL: expdpy-0.4.13.tar.gz
  • Upload date:
  • Size: 545.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.13.tar.gz
Algorithm Hash digest
SHA256 2348e3e002cea072ac5f9b1758c10b12df5a0e93bd3f76801dc3b39cb4933a0d
MD5 b93e904c114cc72cf51ca02e7083ba3c
BLAKE2b-256 afb6d9a77382e70da1450724fce07672dd21c645fcc09e51520b49f9a04bf26b

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.13.tar.gz:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file expdpy-0.4.13-py3-none-any.whl.

File metadata

  • Download URL: expdpy-0.4.13-py3-none-any.whl
  • Upload date:
  • Size: 542.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.13-py3-none-any.whl
Algorithm Hash digest
SHA256 e2f3bf3b2534ce534b8b00ed7b4f8a5ff395031191ce688372bc7c2aca545f4d
MD5 a4a61839f047a6c1fac2ccbb7a50a376
BLAKE2b-256 b7b84800c98e7536e6a2b98d6c891f5e4d0b5282180e1c2387ed11fce800b095

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.13-py3-none-any.whl:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page