Skip to main content

Explore your panel data interactively — a Python port of the ExPanDaR R package.

Project description

expdpy — A Python library to explore panel data interactively

CI codecov Docs Open In Colab Open in Streamlit PyPI Python License: MIT Ruff

expdpy is a Python library for interactive analysis of panel and cross-sectional data, organized around three modules — Explore, Analyze and Learn. It pairs composable functions — that return interactive Plotly figures and publication-quality Great Tables — with fixest-style econometrics, a built-in teaching layer that explains and interprets every result, and three no-code ExPdPy apps (one per module). It is built for beginners and applied researchers alike.

It is built on the modern Python data and econometrics stack:

  • Plotly — interactive figures
  • pyfixest — fixed-effects and difference-in-differences estimators
  • Great Tables — publication-quality tables
  • linearmodels — random effects, between, correlated random effects, and the Hausman test
  • Streamlit — the no-code ExPdPy apps

Features

Explore panel data

Descriptive, correlation and extreme-observation tables; histograms and category bar charts; time trends and quantile trends; by-group bar, violin and trend views; scatter plots with an optional LOESS smoother; a missing-value heatmap across the panel; and outlier treatment with treat_outliers. Each function takes a pandas DataFrame and returns an interactive Plotly figure or a Great Tables object you can drop straight into a notebook or report.

A dedicated set of panel-aware views makes the cross-unit-vs-over-time structure explicit: the within/between variation table explore_xtsum_table (Stata xtsum-style) and the within-vs-between scatter explore_scatter_plot_within_between; per-unit trajectories (explore_spaghetti_plot); panel-structure diagnostics — a balance/gaps summary and unit-by-period presence grid (explore_panel_structure) plus a unit-by-time value heatmap (explore_value_heatmap); and distribution & transition dynamicsexplore_distribution_over_time (ridgeline or animated), explore_transition_matrix, and within-unit serial-correlation via explore_within_persistence. Panel functions take an entity (unit) and a time id; declare them once with set_panel(df, entity=..., time=...) and the rest of Explore can omit them.

Analyze panel data

OLS with multi-way fixed effects and clustered standard errors via pyfixest, plus a richer analyze_estimation adding stepwise / multiple-outcome comparison, serial-correlation-robust standard errors (Newey–West, Driscoll–Kraay) and weights. Estimate pooled / between / fixed / random effects with analyze_panel_table, bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator analyze_cre_table, and choose between fixed and random effects with the Hausman test. Round it out with post-estimation tools (fixed-effect plots, predictions, Wald joint tests), robust inference (randomization inference and the wild cluster bootstrap), Frisch–Waugh–Lovell and coefficient plots, and modern event-study / staggered difference-in-differences estimators (Gardner's did2s, Sun–Abraham, local-projections DiD, dynamic TWFE) with a built-in pre-trend diagnostic and a treatment-structure analyze_panel_view. For growth dynamics, analyze_beta_convergence runs the standard β-convergence workflow — unconditional and conditional (Frisch–Waugh–Lovell) convergence with annotated scatters, the speed of convergence and half-life, and a rolling fixed-window view. Its distributional counterpart analyze_sigma_convergence tracks cross-sectional dispersion over time — standard deviation, Gini index and coefficient of variation on a dual-axis chart — and tests whether the distribution narrows (σ-convergence). When a panel is plausibly not one homogeneous group, analyze_convergence_clubs runs the Phillips–Sul log(t) workflow — HP-filter trend, relative transition paths, the log(t) convergence test and a data-driven clustering algorithm — to split units into convergence clubs (a faithful port of the Stata psecta package). Finally, analyze_kuznets_waves fits the extended Kuznets curve — the polynomial inequality–development relationship under pooled, between and within (two-way fixed-effects) estimators side by side, with partial-residual component plots and turning-point analytics.

Learn panel data

Every result speaks plain language. .interpret() gives an associational reading of the output (never a causal claim unless the design supports it); .explain(), together with explain(topic) and list_topics(), browse 27 concept explainers — fixed effects, clustering, random effects, the Mundlak device, first differences, demeaning, dummy variables, event studies, omitted-variable bias, convergence and more. Result objects also expose broom-style .tidy() / .glance(). Nine concept sandboxes simulate data so a learner can see and tune a known truth — learn_first_differences and learn_within_vs_lsdv (first differences ≈ demeaning ≈ least-squares dummy variables), learn_pooled_vs_fixed_effects, learn_omitted_variable_bias, learn_clustering_se, the convergence trio learn_beta_convergence / learn_sigma_convergence / learn_convergence_clubs, and learn_kuznets_waves.

Three no-code apps — Streamlit

The whole workflow without writing code, in three apps — Explore, Analyze and Learn — that share a sidebar sample pipeline (subset filters, outlier treatment, user-defined variables) and differ only in which pages they expose. Try them live in your browser — no install required:

Each is the no-code companion to a docs case study, and deploys to Streamlit Community Cloud in one click.

Reproducibility & safety

Any in-app exploration exports to a runnable bundle — a Jupyter notebook, a .py script and the prepared data (parquet) — that recreates every displayed result with expdpy calls. Analysis configurations save and load as JSON. New variables can be defined live through a restricted-AST expression evaluator (never eval/exec) with panel-aware lag/lead that shift within each cross-section.

Bundled datasets

expdpy.data ships ready-to-explore panels — kuznets (the flagship N-shaped Kuznets-curve demo), gapminder, staggered_did (a synthetic staggered-adoption panel for the event-study / DiD tools), firms (a small unbalanced panel — staggered entry/exit, interior gaps, a discrete size class and persistent revenue — for the panel-structure, transition and persistence views), productivity (a balanced 108-country × 25-year Penn World Table panel of log GDP per capita and log labor productivity for the club-convergence workflow), and bolivia112_gdppc (a real-world balanced 112-province × 35-year Bolivian panel of GDP per capita and its log, 1990–2024, for the convergence workflows and subnational exploration). See the kuznets dataset page for the data dictionary.

Installation

Install the latest release from PyPI (random effects, CRE and the Hausman test work out of the box; the apps need the streamlit extra):

pip install expdpy
pip install "expdpy[streamlit]"   # the no-code ExPdPy apps (Streamlit)

Using uv:

uv pip install expdpy
uv pip install "expdpy[streamlit]"

Development version (latest from GitHub)

For the most up-to-date, unreleased version, install straight from the main branch:

pip install "git+https://github.com/cmg777/expdpy.git"
pip install "expdpy[streamlit] @ git+https://github.com/cmg777/expdpy.git"

Pin to a release, branch, or commit for reproducible installs:

pip install "expdpy==0.4.18"
pip install "git+https://github.com/cmg777/expdpy.git@v0.4.18"
pip install "git+https://github.com/cmg777/expdpy.git@main"

Requires Python 3.10+.

Try it in Colab — no install

Every page of the docs (and the per-function guides) carries a one-click Open in Colab badge. The notebook's first cell installs expdpy and then restarts the runtime once so the upgraded NumPy loads cleanly — when it reconnects, just run the cells again (Runtime ▸ Run all).

Upgrading from 0.4.1? In 0.4.2 every analysis function gained a module prefix: prepare_*explore_* / analyze_* and sandbox_*learn_*, with figures ending in _plot, tables in _table, and scope qualifiers moved to the end (e.g. prepare_by_group_violin_graphexplore_violin_plot_by_group). The utilities set_panel, resolve_panel, treat_outliers, explain and list_topics keep their names. See the changelog for the full rename map.

At a glance

The lead example throughout the docs is the bundled kuznets panel (80 countries × 2015–2025): a synthetic dataset whose regional inequality traces an N-shaped Kuznets curve in income — it rises, falls, then rises again at very high income.

import expdpy as ex
from expdpy.data import load_kuznets

df = load_kuznets()
# The N-shaped regional Kuznets curve: regional inequality vs (log) GDP per capita
ex.explore_scatter_plot(
    df, x="log_gdp_pc", y="gini_regional", color="continent", size="population", loess=1
).fig

Explore the panel structure. Declare the panel once, then split a variable's variation into across-unit (between) and over-time (within) parts, or trace every unit at once:

df = ex.set_panel(load_kuznets(), entity="country", time="year")

ex.explore_xtsum_table(df, var=["gini_regional", "log_gdp_pc"]).gt   # within/between table
ex.explore_spaghetti_plot(df, var="gini_regional").fig              # one line per country
ex.explore_scatter_plot_within_between(df, x="log_gdp_pc", y="gini_regional").fig

Run a regression and let it explain itself. Two-way fixed effects, clustered standard errors, a plain-language reading, and a coefficient plot:

res = ex.analyze_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
)
print(res.interpret())            # plain-language, associational reading
ex.analyze_coefficient_plot(res)  # themed coefficient plot with confidence intervals

Bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator — its slope equals the fixed-effects estimate, and a joint test on the unit-mean terms is the regression-form Hausman test:

ex.analyze_cre_table(
    df, dv="gini_regional", idvs=["log_gdp_pc"], entity="country", time="year"
).etable

Event study & staggered difference-in-differences on the bundled treated panel:

from expdpy.data import load_staggered_did

did = load_staggered_did()
ex.analyze_panel_view(did, unit="unit", time="year", cohort="cohort")   # treatment structure
ex.analyze_event_study(                                                  # dynamic effects
    did, outcome="outcome", unit="unit", time="year", cohort="cohort", estimator="did2s"
).fig

Classic panel models and the Hausman test:

ex.analyze_panel_table(did, dv="outcome", idvs=["treated"], entity="unit", time="year").etable
print(ex.analyze_hausman_test(did, dv="outcome", idvs=["treated"], entity="unit", time="year").interpret())

Learn as you go — concept sandboxes and explainers:

ex.learn_first_differences()      # first differences ≈ demeaning ≈ dummy variables
print(ex.explain("fixed_effects"))  # a concept explainer; ex.list_topics() lists them all

Launch the Explore app on this data, pre-configured to open on the curve:

from expdpy.streamlit_app import ExploreApp
from expdpy.data import load_kuznets, load_kuznets_data_def, get_config

ExploreApp(load_kuznets(), df_def=load_kuznets_data_def(), config_list=get_config("kuznets"))

Head to Explore, Analyze and Learn to see every function in action, or the kuznets dataset page for the data dictionary.

Documentation

Full documentation, tutorials, and the API reference live at https://cmg777.github.io/expdpy/.

Acknowledgements

expdpy began as a Python port of the excellent ExPanDaR package by Joachim Gassen and the TRR 266 Accounting for Transparency project, and its foundations remain deeply inspired by that work. Over time it has grown well beyond the original — three no-code Streamlit apps; fixest-style estimators (fixed effects, event study and staggered difference-in-differences) with coefficient and Frisch–Waugh–Lovell plots; random-effects, correlated-random-effects and Hausman panel models; a built-in pedagogy layer that interprets and explains results; a restricted-AST expression evaluator with panel-aware lag/lead; and reproducible notebook / script / data export — and it will keep evolving. We are grateful to the ExPanDaR authors; please cite the original work when using expdpy in research (see CITATION.cff).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expdpy-0.4.18.tar.gz (556.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

expdpy-0.4.18-py3-none-any.whl (551.2 kB view details)

Uploaded Python 3

File details

Details for the file expdpy-0.4.18.tar.gz.

File metadata

  • Download URL: expdpy-0.4.18.tar.gz
  • Upload date:
  • Size: 556.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.18.tar.gz
Algorithm Hash digest
SHA256 53d696df7be16bf5a2931f4e4fa74ec87a0c82ec85569e50944a4f935a732010
MD5 140e01716af033e8a591cc10aa4b919b
BLAKE2b-256 cef6daa9dc122e401c210f15634bc8ce71a3298df29a22a2126cd868dc02ae64

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.18.tar.gz:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file expdpy-0.4.18-py3-none-any.whl.

File metadata

  • Download URL: expdpy-0.4.18-py3-none-any.whl
  • Upload date:
  • Size: 551.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.18-py3-none-any.whl
Algorithm Hash digest
SHA256 af74f24c34f186da1da950c6c126028ea214ed913dff17dde9fce8da5308f443
MD5 76072f9ef7c6e2bb5ce554d6d3d03b05
BLAKE2b-256 490828b92eef60b62f8f3f145b3fbcb92e218c6b5288184fbc2e3eb6bc35fa09

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.18-py3-none-any.whl:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page