This module implements the panel (and single entity) event study models, covering the naive two-way fixed effects implementation, and the interaction-weighted implementation from Sun and Abraham (2021).
Project description
paneleventstudy
This Python package implements the panel (and single entity) event study models, covering the naive two-way fixed effects implementation, and the interaction-weighted implementation from Sun and Abraham (2021) (derived from https://github.com/lsun20/EventStudyInteract).
The package includes three sets of functions:
Data cleaning
: Functions to prepare data frames for the analytical set of functions, e.g., ensuring that they are in the right format, and have the right columns (with the right content)Analytical
: Direct implementation of the event study modelsUtilities
: Tools to assist the user in setting up input-output flows
Installation
pip install paneleventstudy
Examples
Refer to the JuPyTeR notebook example_paneleventstudy.ipynb
Implementation Notes (Data Cleaning)
Counting and dropping missing observations
Documentation
paneleventstudy.dropmissing(data, event)
Parameters
data
:
pandas dataframe
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
Output
- A copy of
data
with rows corresponding to missingevent
dropped - Display on the interface the number of rows in
data
, and the number of rows in the output data frame
Checking if input dataframe is a balanced panel
Most panel event study methods in the literature, and is the case at present for all methods covered in this package, only work with balanced panel data. That is to say that all entities in the data set must have the same number of time periods. This package checks if the input data is indeed a balanced panel with entities $i \in [0, 1, ..., N-1, N]$ and time periods $t \in [0, 1, 2, ..., T-1, T]$ in two steps.
-
Check if all entities $i$ have the same number of time periods $$L(\mathbf{t}_{i}) = L_T \ \forall \ i \ for \ L_T \in \mathcal{N}^{+} $$
-
Optionally, if the calendar time variable in the input data frame is numeric, further check if the smallest and largest time values are the same for all entities $i$
$$\min(\mathbf{t}_{i}) = L_Tmin \ \forall \ i \ for \ L_Tmin \in \mathcal{N}^{0+} $$
$$\max(\mathbf{t}_{i}) = L_Tmax \ \forall \ i \ for \ L_Tmax \in \mathcal{N}^{+} $$
Documentation
paneleventstudy.balancepanel(data, group, event, calendartime, check_minmax=True)
Parameters
data
:
pandas dataframe
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
calendartime
:
Integers or integers matching the label of the column in data
containing calendar times going from the earliest to last time period; this can be user-fed or generated from gencalendartime_numericscalendartime
.
check_minmax
:
Boolean to trigger option for a deeper check, which verifies if all entities in group
have the same minimum and maximum values in calendartime
; default is True
, and can be used when the calendartime
column is generted from gencalendartime_numericscalendartime
, or are already preset as integers
Output
A Boolean indicating if data
is balanced.
Generate column indicating control groups (never-treated / last-treated)
In the difference in difference (DiD) methodology, which event studies are a variant of, if treatment is truly exogenous, the treatment effect is estimated by comparing the average outcomes of the treated group (received treatment) against the control group (did not receive treatment).
Discussing endogeneity aside, in panel event studies, and indeed dynamic DiD, it is possible for no groups to be never-treated, e.g., in staggered DiD setups. Choosing the right control group is essential in establishing the right counterfactual, on which unbiased or consistent treatment effect estimates are conditioned on. In these cases, we may want to use the last-treated group as a control group. This was argued prominently in recent DiD papers, such as Callaway and Sant'Anna and Sun and Abraham (2021).
This function tells us which group(s) is / are the control groups, whether never-treated or last-treated, which is essential for the analytical functions later.
Documentation
paneleventstudy.identifycontrols(data, group, event)
Parameters
data
:
pandas dataframe
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
Output
A copy of data
with a new column labelled control_group
indicating if the entity in group
is a control group (never-treated or last-treated).
Generate relative time column (treatment onset = 0)
Event studies methodologies essentially estimate the dynamic treatment effect relative to the onset of treatment (i.e., before and after treatment). This is akin to asking "what is the effect of treatment $D$ on outcome $Y$ at different timepoints relative to the treatment timing?".
This function generates a column containing these relative times from two sets of information:
- Calendar time; and
- When the treatment or event happened
Documentation
paneleventstudy.genreltime(data, group, event, calendartime, reltime='reltime', check_balance=True)
Parameters
data
:
pandas dataframe
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
calendartime
:
Integers matching the label of the column in data
containing calendar times going from 0 (earliest time period) to T (last time period); this can be generated from gencalendartime_numericscalendartime
reltime
:
String to be used as the label of a new column containing relative times going from -L to +K as integers, with 0 being the timing of treatment onset
check_balance
:
Checks if data
is a balanced panel; default option is True
Output
A copy of data
with a new column labelled reltime
containing the relative times for all calendar times in calendar
by entities in group
.
Generate column indicating treatment cohorts
Sun and Abraham (2021)'s interaction-weighted event study methodology requires (1) the estimation of cohort-specific treatment effects, and (2) cohort shares by relative times. To do this, the methodology requires an identifier for groups that were treated in the same calendar time.
Documentation
paneleventstudy.gencohort(data, group, event, calendartime, cohort='cohort', check_balance=True)
Parameters
data
:
pandas dataframe
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
calendartime
:
Integers matching the label of the column in data
containing calendar times going from 0 (earliest time period) to T (last time period); this can be generated from gencalendartime_numericscalendartime
cohort
:
String to be used as the label of a new column containing the treatment cohort of respective entities in group
; default is 'cohort'
check_balance
:
Checks if data
is a balanced panel; default option is True
Output
A copy of data
with a new column labelled cohort
indicating the treatment cohort that the entities group
belong to.
Generate calendar time with integers
For generalise across the infinitely many possible formats that calendar times can be presented in (e.g., miliseconds, seconds, days, months, quarters, years, or even custom ones), calendar times can be converted into numerics. This eases computation in the rest of the package, by converting the calendar time column into integers starting from 0 (earliest) to T (latest).
Documentation
paneleventstudy.gencalendartime_numerics(data, group, event, calendartime, calendartime_numerics='ct')
Parameters
data
:
pandas dataframe
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
calendartime
:
Column matching the label of the column in data
containing calendar times going from the earliest time period to the last time period
calendartime_numerics
:
String to be used as the label of a new column containing the calendar times converted into nonnegative integers with 0 being the earliest, and T being the latest period
Output
A copy of data
with a new column labelled calendartime_numerics
with numeric version of calendartime
, which can then be passed to the analytical functions.
Identify collinear or invariant columns
The basic functional form of estimating equations in the DiD and event study methodology is a linear regression, which requires variables in the RHS of the equation to not be multicollinear, or invariant. This function checks if this is indeed the case.
Documentation
paneleventstudy.checkcollinear(data, rhs)
Parameters
data
:
pandas dataframe
rhs
:
A list containing strings matching the labels of the columns in data
to be checked for collinearity and invariance; precedence goes to columns in the rightmost of rhs
(if two columns are collinear, the one appearing later in rhs
is not included in the output)
Output
A list of labels in rhs
which should be dropped to avoid multicollinearity or invariance in rhs
columns in data
.
Identify linearly dependent columns
The basic functional form of estimating equations in the DiD and event study methodology is a linear regression, which requires the matrix containing the variables in the RHS of the equation to satisfy full column rank. This function checks if this is indeed the case.
Documentation
paneleventstudy.checkfullrank(data, rhs, intercept='Intercept')
Parameters
data
:
pandas dataframe
rhs
:
A list containing strings matching the labels of the columns in data
to be checked for full rank; precedence goes to columns in the rightmost of rhs
intercept
:
String containing the label of the intercept column (column of numerics 1), which will be given precedence in the procedure; set as None
if no intercepts are contained in data
, and the default is 'Intercept'
, which is the default when using patsy.dmatrices()
Output
A list of labels in rhs
which should be dropped to for the matrix containing rhs
columns in data
to satisfy full rank.
Implementation Notes (Analytical)
Naive Two-Way Fixed Effects (TWFE) Panel Event Study
Estimates dynamic treatment effects using a standard TWFE model. Specifically, we are interested in estimating $\beta_{l}$, the coefficients on leads and lags of treatment dummies, where $l$ is relative time as in Sun and Abraham (2021), i.e., the time period relative to treatment onset. $l=0$ refers to when the treatment was applied to entity $i$.
$D_{i,t}^{l}$ are dummies switching on if entity $i$ is in calendar time $t$, and is $l$ periods relative to the treatment onset. That is also to say that $D_{i, t}^{l} \ \forall \ t, l$ never-treated entities will take values $0$. A TWFE regression model includes entity fixed effects ($\alpha_i$), and time fixed effects ($\alpha_t$). $\mathbf{X_{i, t}}$ is an optional vector of time-varying (within-entity) controls. $\epsilon_{i, t}$ are the errors.
$$Y_{i,t} = \alpha_i + \alpha_t + \sum_{l=-K}^{-2} \beta_{l} D_{i, t}^{l} + \sum_{l=0}^{M} \beta_{l} D_{i, t}^{l} + \mathbf{X_{i, t} \gamma} + \epsilon_{i, t}$$
Documentation
paneleventstudy.naivetwfe_eventstudy(data, outcome, event, group, reltime, calendartime, covariates, vcov_type='robust', check_balance=True)
Parameters
data
:
pandas dataframe
outcome
:
String matching the label of the column in data
corresponding to the outcome variable; this is the LHS variable in the regression
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
reltime
:
Integers matching the label of the column in data
containing relative times going from -L to +K, with 0 being the timing of treatment onset; this can be generated from calendartime
generated from genreltime
, and reltime=-1
is automatically chosen as the reference period
calendartime
:
Integers matching the label of the column in data
containing calendar times going from 0 (earliest time period) to T (last time period); this can be generated from gencalendartime_numericscalendartime
.
covariates
:
List of columns corresponding to control variables in data
to be included in the RHS of the regression; if no covariates are to be included, set covariates=[]
vcov_type
:
String corresponding to the type of variance-covariance estimator in linearmodels.PanelOLS.fit(), which is called during the estimation process; default option is 'robust'
check_balance
:
Checks if data
is a balanced panel; default option is True
Output
Returns a pandas dataframe with 3 columns, indexed to reltime
:
parameter
: The point estimates of the interaction-weighted average treatment affectslower
: The lower confidence bound ofparameter
upper
: The upper confidence bound ofparameter
Interaction-Weighted Panel Event Study
Estimates dynamic treatment effects using the interaction-weighted estimator described in Sun and Abraham (2021). Again, for the following structural equation, we are interested in estimating $\beta_{l}$, the coefficients on leads and lags of treatment dummies, where $l$ is relative time as in Sun and Abraham (2021), i.e., the time period relative to treatment onset. $l=0$ refers to when the treatment was applied to entity $i$.
$$Y_{i,t} = \alpha_i + \alpha_t + \sum_{l=-K}^{-2} \beta_{l} D_{i, t}^{l} + \sum_{l=0}^{M} \beta_{l} D_{i, t}^{l} + \mathbf{X_{i, t} \gamma} + \eta_{i, t}$$
This implementation has 3 broad steps.
-
Calculate the cohort shares by relative time, $\mathbb{E} (E_i = e | E_i \in g )$ where $g$ is the set of relative times included in the analysis. This package uses a no-constant linear regression model with an OLS estimator as per the Sun and Abraham (2021)'s original Stata package here. Using a linear regression approach, instead of simple tabulation, allows for calculation of standard errors of the cohort share estimates.
$$1{E_i = e | E_i \in g } = w_{e,l} D_{i, t}^{l} + e_i$$
-
Estimate the cohort-specific average treatment effects, $CATT_{e, l}$, by interacting the cohort dummy with the treatment / relative time dummy, $1(E_i = e) D_{i,t}^{l}$.
$$Y_{i,t} = \alpha_i + \alpha_t + \sum_{l=-K}^{-2} \delta_{l} 1(E_i = e) D_{i,t}^{l} + \sum_{l=0}^{M} \delta_{l} 1(E_i = e) D_{i,t}^{l} + \mathbf{X_{i, t} \gamma} + \varepsilon_{i, t}$$
-
Calculate the interaction-weighted average treatment effects using output from steps 1 and 2 for every relative time $l$. In this current version, the estimated confidence bands are scaled the same way.
$$\hat{\beta_l} = \sum_{e} \hat{\delta_{l}} \hat{w_{e,l}} \ \forall \ l$$
Documentation
paneleventstudy.interactionweighted_eventstudy(data, outcome, event, group, cohort, reltime, calendartime, covariates, vcov_type='robust', check_balance=True)
Parameters
data
:
pandas dataframe
outcome
:
String matching the label of the column in data
corresponding to the outcome variable; this is the LHS variable in the regression
event
:
String matching the label of the column in data
corresponding to the event variable; this should be a dummy variable indicating the pre- (values 0 prior to relative time 0) and post- periods (values 1 from relative time 0 onwards)
group
:
String matching the label of the column in data
containing the categorical levels of the individual entities
cohort
:
Integers matching the label of the column in data
containing the categorical levels of the cohorts in the data set generated from gencohort
(e.g., all entities treated in calendar time 3 should take the value 3 in this column)
reltime
:
Integers matching the label of the column in data
containing relative times going from -L to +K, with 0 being the timing of treatment onset; this can be generated from calendartime
generated from genreltime
, and reltime=-1
is automatically chosen as the reference period
calendartime
:
Integers matching the label of the column in data
containing calendar times going from 0 (earliest time period) to T (last time period); this can be generated from gencalendartime_numericscalendartime
.
covariates
:
List of columns corresponding to control variables in data
to be included in the RHS of the regression; if no covariates are to be included, set covariates=[]
vcov_type
:
String corresponding to the type of variance-covariance estimator in linearmodels.PanelOLS.fit(), which is called during the estimation process; default option is 'robust'
check_balance
:
Checks if data
is a balanced panel; default option is True
Output
Returns a pandas dataframe with 3 columns, indexed to reltime
:
parameter
: The point estimates of the interaction-weighted average treatment affectslower
: The lower confidence bound ofparameter
upper
: The upper confidence bound ofparameter
Single Entity Event Study
Estimates dynamic treatment effects ($\beta_{l}$ coefficients on leads and lags of treatment dummies) using an single entity linear regression model with an OLS estimator, where $l=0$ refers to when the treatment was applied. $D_{l}$ are dummies switching when the entity is $l$ periods relative to the treatment onset. The linear regression includes a constant ($\alpha$), is an optional vector of controls ($\mathbf{X_{t}}$). $\epsilon_{t}$ are the errors.
$$Y_{t} = \alpha + \sum_{l=-K}^{-2} \beta_{l} D_{l} + \sum_{l=0}^{M} \beta_{l} D_{l} + \mathbf{X_{t} \gamma} + \epsilon_{t}$$
Documentation
paneleventstudy.timeseries_eventstudy(data, outcome, reltime, covariates, vcov_type='HC3')
Parameters
data
:
pandas dataframe
outcome
:
String matching the label of the column in data
corresponding to the outcome variable; this is the LHS variable in the regression
reltime
:
Integers matching the label of the column in data
containing relative times going from -L to +K, with 0 being the timing of treatment onset
covariates
:
List of columns corresponding to control variables in data
to be included in the RHS of the regression; if no covariates are to be included, set covariates=[]
vcov_type
:
String corresponding to the type of variance-covariance estimator in statsmodels.regression.linear_model.RegressionResults.get_robustcov_results, which is called during the estimation process; default option is 'HC3'
Output
Returns a pandas dataframe with 3 columns, indexed to reltime
:
parameter
: The point estimates of the interaction-weighted average treatment affectslower
: The lower confidence bound ofparameter
upper
: The upper confidence bound ofparameter
Implementation Notes (Utilities)
Plotting Event Study Lead and Lag Coefficients
This function calls plotly's graph_objects module to show the event study estimates (dynamic treatment effects) to be shown as a line chart, together with their confidence bands (can be manually excluded). Moreover, it exports an interactive graph as a html file via plotly's plotly.io.write_html(), and a static graph as a png file via plotly's [plotly.io.write_image()]. Users of this package may, of course, opt to other charting packages, modules, or scripts to plot the event study estimates.
Documentation
paneleventstudy.eventstudyplot(input, big_title='Event Study Plot (With 95% CIs)', path_output='', name_output='eventstudyplot')
Parameters
input
:
Output from the analytical functions (paneleventstudy.naivetwfe_eventstudy()
, paneleventstudy.interactionweighted_eventstudy()
, paneleventstudy.timeseries_eventstudy()
); manually exclude columns lower
and upper
if only the point estimates are to be shown
big_title
:
String containing the main title of the figure; default is 'Event Study Plot (With 95% CIs)'
path_output
:
String containing the directory of where the output files should be saved in; default is ''
, i.e., the present working directory
name_output
=
String containing the file name of the image and html file to be generated; default is 'eventstudyplot'
Output
Requirements
Python Packages
- pandas>=1.4.3
- numpy>=1.23.0
- linearmodels>=4.27
- plotly>=5.9.0
- statsmodels>=0.13.2
- sympy>=1.10.1
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file paneleventstudy-0.0.6.tar.gz
.
File metadata
- Download URL: paneleventstudy-0.0.6.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f191a48857d72a4b8ed75e6d4cc9253c8a0b7d07756fcfa56be5be75ff08ff0a |
|
MD5 | a0b6b02a6d73264d79b515ea27741706 |
|
BLAKE2b-256 | f7e37a7885a401d2da473d36b92d2d6e607ff89e261256cd22becb90b0e19ad9 |
File details
Details for the file paneleventstudy-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: paneleventstudy-0.0.6-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c0f419241f13d7f8a597bca612fcd02d5218b5ae36f6ebbc2fc40a8bab2bc9a |
|
MD5 | a557c04b580bbcdd40d50b049911a4c4 |
|
BLAKE2b-256 | e5e51bcfe4960b6cc615f5b6bd7045a234699ff26a3d7c1ffd4b5d63077231d0 |