expdpy

Explore your panel data interactively — a Python port of the ExPanDaR R package.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cmg777

These details have not been verified by PyPI

Project description

expdpy — A Python library to explore panel data interactively

expdpy is a Python library for interactive analysis of panel and cross-sectional data, organized around three modules — Explore, Analyze and Learn. It pairs composable functions — that return interactive Plotly figures and publication-quality Great Tables — with fixest-style econometrics, a built-in teaching layer that explains and interprets every result, and three no-code ExPdPy apps (one per module). It is built for beginners and applied researchers alike.

It is built on the modern Python data and econometrics stack:

Plotly — interactive figures
pyfixest — fixed-effects and difference-in-differences estimators
Great Tables — publication-quality tables
linearmodels — random effects, between, correlated random effects, and the Hausman test
Streamlit — the no-code ExPdPy apps

Features

Explore panel data

Descriptive, correlation and extreme-observation tables; histograms and category bar charts; time trends and quantile trends; by-group bar, violin and trend views; scatter plots with an optional LOESS smoother; a missing-value heatmap across the panel; and outlier treatment with treat_outliers. Each function takes a pandas DataFrame and returns an interactive Plotly figure or a Great Tables object you can drop straight into a notebook or report.

A dedicated set of panel-aware views makes the cross-unit-vs-over-time structure explicit: the within/between variation table explore_xtsum_table (Stata xtsum-style) and the within-vs-between scatter explore_scatter_plot_within_between; per-unit trajectories (explore_spaghetti_plot); panel-structure diagnostics — a balance/gaps summary and unit-by-period presence grid (explore_panel_structure) plus a unit-by-time value heatmap (explore_value_heatmap); and distribution & transition dynamics — explore_distribution_over_time (ridgeline or animated), explore_transition_matrix, and within-unit serial-correlation via explore_within_persistence. Panel functions take an entity (unit) and a time id; declare them once with set_panel(df, entity=..., time=...) and the rest of Explore can omit them.

Analyze panel data

OLS with multi-way fixed effects and clustered standard errors via pyfixest, plus a richer analyze_estimation adding stepwise / multiple-outcome comparison, serial-correlation-robust standard errors (Newey–West, Driscoll–Kraay) and weights. Estimate pooled / between / fixed / random effects with analyze_panel_table, bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator analyze_cre_table, and choose between fixed and random effects with the Hausman test. Round it out with post-estimation tools (fixed-effect plots, predictions, Wald joint tests), robust inference (randomization inference and the wild cluster bootstrap), Frisch–Waugh–Lovell and coefficient plots, and modern event-study / staggered difference-in-differences estimators (Gardner's did2s, Sun–Abraham, local-projections DiD, dynamic TWFE) with a built-in pre-trend diagnostic and a treatment-structure analyze_panel_view. For growth dynamics, analyze_beta_convergence runs the standard β-convergence workflow — unconditional and conditional (Frisch–Waugh–Lovell) convergence with annotated scatters, the speed of convergence and half-life, and a rolling fixed-window view. Its distributional counterpart analyze_sigma_convergence tracks cross-sectional dispersion over time — standard deviation, Gini index and coefficient of variation on a dual-axis chart — and tests whether the distribution narrows (σ-convergence). When a panel is plausibly not one homogeneous group, analyze_convergence_clubs runs the Phillips–Sul log(t) workflow — HP-filter trend, relative transition paths, the log(t) convergence test and a data-driven clustering algorithm — to split units into convergence clubs (a faithful port of the Stata psecta package).

Learn panel data

Every result speaks plain language. .interpret() gives an associational reading of the output (never a causal claim unless the design supports it); .explain(), together with explain(topic) and list_topics(), provides concept explainers for fixed effects, clustering, random effects, the Mundlak device, first differences, demeaning, dummy variables, event studies, omitted-variable bias and more. Result objects also expose broom-style .tidy() / .glance(). Concept sandboxes simulate data so a learner can see and tune a concept — learn_omitted_variable_bias, learn_pooled_vs_fixed_effects, learn_clustering_se, learn_first_differences, and learn_within_vs_lsdv (which shows first differences ≈ demeaning ≈ least-squares dummy variables).

Three no-code apps — Streamlit

The whole workflow without writing code, in three apps — Explore, Analyze and Learn — that share a sidebar sample pipeline (subset filters, outlier treatment, user-defined variables) and differ only in which pages they expose. The apps deploy to Streamlit Community Cloud in one click.

Reproducibility & safety

Any in-app exploration exports to a runnable bundle — a Jupyter notebook, a .py script and the prepared data (parquet) — that recreates every displayed result with expdpy calls. Analysis configurations save and load as JSON. New variables can be defined live through a restricted-AST expression evaluator (never eval/exec) with panel-aware lag/lead that shift within each cross-section.

Bundled datasets

expdpy.data ships ready-to-explore panels — kuznets (the flagship N-shaped Kuznets-curve demo), gapminder, staggered_did (a synthetic staggered-adoption panel for the event-study / DiD tools), firms (a small unbalanced panel — staggered entry/exit, interior gaps, a discrete size class and persistent revenue — for the panel-structure, transition and persistence views), productivity (a balanced 108-country × 25-year Penn World Table panel of log GDP per capita and log labor productivity for the club-convergence workflow), and bolivia112_gdppc (a real-world balanced 112-province × 35-year Bolivian panel of GDP per capita and its log, 1990–2024, for the convergence workflows and subnational exploration). See the kuznets dataset page for the data dictionary.

Installation

Install the latest release from PyPI (random effects, CRE and the Hausman test work out of the box; the apps need the streamlit extra):

pip install expdpy
pip install "expdpy[streamlit]"   # the no-code ExPdPy apps (Streamlit)

Using uv:

uv pip install expdpy
uv pip install "expdpy[streamlit]"

Development version (latest from GitHub)

For the most up-to-date, unreleased version, install straight from the main branch:

pip install "git+https://github.com/cmg777/expdpy.git"
pip install "expdpy[streamlit] @ git+https://github.com/cmg777/expdpy.git"

Pin to a release, branch, or commit for reproducible installs:

pip install "expdpy==0.4.17"
pip install "git+https://github.com/cmg777/expdpy.git@v0.4.17"
pip install "git+https://github.com/cmg777/expdpy.git@main"

Requires Python 3.10+.

Try it in Colab — no install

Every page of the docs (and the per-function guides) carries a one-click Open in Colab badge. The notebook's first cell installs expdpy and then restarts the runtime once so the upgraded NumPy loads cleanly — when it reconnects, just run the cells again (Runtime ▸ Run all).

Upgrading from 0.4.1? In 0.4.2 every analysis function gained a module prefix: prepare_* → explore_* / analyze_* and sandbox_* → learn_*, with figures ending in _plot, tables in _table, and scope qualifiers moved to the end (e.g. prepare_by_group_violin_graph → explore_violin_plot_by_group). The utilities set_panel, resolve_panel, treat_outliers, explain and list_topics keep their names. See the changelog for the full rename map.

At a glance

The lead example throughout the docs is the bundled kuznets panel (80 countries × 2015–2025): a synthetic dataset whose regional inequality traces an N-shaped Kuznets curve in income — it rises, falls, then rises again at very high income.

import expdpy as ex
from expdpy.data import load_kuznets

df = load_kuznets()
# The N-shaped regional Kuznets curve: regional inequality vs (log) GDP per capita
ex.explore_scatter_plot(
    df, x="log_gdp_pc", y="gini_regional", color="continent", size="population", loess=1
).fig

Explore the panel structure. Declare the panel once, then split a variable's variation into across-unit (between) and over-time (within) parts, or trace every unit at once:

df = ex.set_panel(load_kuznets(), entity="country", time="year")

ex.explore_xtsum_table(df, var=["gini_regional", "log_gdp_pc"]).gt   # within/between table
ex.explore_spaghetti_plot(df, var="gini_regional").fig              # one line per country
ex.explore_scatter_plot_within_between(df, x="log_gdp_pc", y="gini_regional").fig

Run a regression and let it explain itself. Two-way fixed effects, clustered standard errors, a plain-language reading, and a coefficient plot:

res = ex.analyze_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
)
print(res.interpret())            # plain-language, associational reading
ex.analyze_coefficient_plot(res)  # themed coefficient plot with confidence intervals

Bring within estimates into a random-effects frame with the correlated-random-effects (Mundlak) estimator — its slope equals the fixed-effects estimate, and a joint test on the unit-mean terms is the regression-form Hausman test:

ex.analyze_cre_table(
    df, dv="gini_regional", idvs=["log_gdp_pc"], entity="country", time="year"
).etable

Event study & staggered difference-in-differences on the bundled treated panel:

from expdpy.data import load_staggered_did

did = load_staggered_did()
ex.analyze_panel_view(did, unit="unit", time="year", cohort="cohort")   # treatment structure
ex.analyze_event_study(                                                  # dynamic effects
    did, outcome="outcome", unit="unit", time="year", cohort="cohort", estimator="did2s"
).fig

Classic panel models and the Hausman test:

ex.analyze_panel_table(did, dv="outcome", idvs=["treated"], entity="unit", time="year").etable
print(ex.analyze_hausman_test(did, dv="outcome", idvs=["treated"], entity="unit", time="year").interpret())

Learn as you go — concept sandboxes and explainers:

ex.learn_first_differences()      # first differences ≈ demeaning ≈ dummy variables
print(ex.explain("fixed_effects"))  # a concept explainer; ex.list_topics() lists them all

Launch the Explore app on this data, pre-configured to open on the curve:

from expdpy.streamlit_app import ExploreApp
from expdpy.data import load_kuznets, load_kuznets_data_def, get_config

ExploreApp(load_kuznets(), df_def=load_kuznets_data_def(), config_list=get_config("kuznets"))

Head to Explore, Analyze and Learn to see every function in action, the kuznets dataset page for the data dictionary, or the app guide to launch the interactive apps.

Documentation

Full documentation, tutorials, and the API reference live at https://cmg777.github.io/expdpy/.

Acknowledgements

expdpy began as a Python port of the excellent ExPanDaR package by Joachim Gassen and the TRR 266 Accounting for Transparency project, and its foundations remain deeply inspired by that work. Over time it has grown well beyond the original — three no-code Streamlit apps; fixest-style estimators (fixed effects, event study and staggered difference-in-differences) with coefficient and Frisch–Waugh–Lovell plots; random-effects, correlated-random-effects and Hausman panel models; a built-in pedagogy layer that interprets and explains results; a restricted-AST expression evaluator with panel-aware lag/lead; and reproducible notebook / script / data export — and it will keep evolving. We are grateful to the ExPanDaR authors; please cite the original work when using expdpy in research (see CITATION.cff).

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cmg777

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.19

Jun 26, 2026

0.4.18

Jun 26, 2026

This version

0.4.17

Jun 25, 2026

0.4.16

Jun 25, 2026

0.4.15

Jun 25, 2026

0.4.14

Jun 24, 2026

0.4.13

Jun 24, 2026

0.4.11

Jun 23, 2026

0.4.9

Jun 22, 2026

0.4.8

Jun 22, 2026

0.4.7

Jun 22, 2026

0.4.5

Jun 21, 2026

0.4.4

Jun 21, 2026

0.2.0

Jun 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expdpy-0.4.17.tar.gz (547.8 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

expdpy-0.4.17-py3-none-any.whl (544.0 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file expdpy-0.4.17.tar.gz.

File metadata

Download URL: expdpy-0.4.17.tar.gz
Upload date: Jun 25, 2026
Size: 547.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.17.tar.gz
Algorithm	Hash digest
SHA256	`f1503d03a92928e8efc9718efd6c979f25aaebd961f3ab865de7fb3c21603392`
MD5	`9f144c812b3fa9f8ffd5130020dc82e0`
BLAKE2b-256	`8a83c1a243f402a3eb00a785a561557440de2e4d52c91b6ae9002f1da4ea5748`

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.17.tar.gz:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: expdpy-0.4.17.tar.gz
- Subject digest: f1503d03a92928e8efc9718efd6c979f25aaebd961f3ab865de7fb3c21603392
- Sigstore transparency entry: 1947625070
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: cmg777/expdpy@5611976cb30bc7666ca58893b699849819e06cd7
- Branch / Tag: refs/tags/v0.4.17
- Owner: https://github.com/cmg777
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5611976cb30bc7666ca58893b699849819e06cd7
- Trigger Event: push

File details

Details for the file expdpy-0.4.17-py3-none-any.whl.

File metadata

Download URL: expdpy-0.4.17-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 544.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for expdpy-0.4.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2a494452317ce25c0ce7eb796742c24e0a13473eab45f1d7d1bc4d0843d248f`
MD5	`5fc7c7920994e27d5a4e69fd334a5a68`
BLAKE2b-256	`2c74f083c0110a7c57472d9a3f99c0769899d8d101914769327b5cb45bae2e22`

See more details on using hashes here.

Provenance

The following attestation bundles were made for expdpy-0.4.17-py3-none-any.whl:

Publisher: release.yml on cmg777/expdpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: expdpy-0.4.17-py3-none-any.whl
- Subject digest: f2a494452317ce25c0ce7eb796742c24e0a13473eab45f1d7d1bc4d0843d248f
- Sigstore transparency entry: 1947625268
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: cmg777/expdpy@5611976cb30bc7666ca58893b699849819e06cd7
- Branch / Tag: refs/tags/v0.4.17
- Owner: https://github.com/cmg777
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5611976cb30bc7666ca58893b699849819e06cd7
- Trigger Event: push

expdpy 0.4.17

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Explore panel data

Analyze panel data

Learn panel data

Three no-code apps — Streamlit

Reproducibility & safety

Bundled datasets

Installation

Development version (latest from GitHub)

Try it in Colab — no install

At a glance

Documentation

Acknowledgements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance