Skip to main content

Solutions to linear model with high dimensional fixed effects.

Project description

FixedEffectModelPyHDFE: A Python Package for Linear Model with High Dimensional Fixed Effects.

FixedEffectModel is a Python Package designed and built by Kuaishou DA ecology group. It provides solutions for linear model with high dimensional fixed effects,including support for calculation in variance (robust variance and multi-way cluster variance), fixed effects, and standard error of fixed effects. It also supports model with instrument variables (will upgrade in late Nov.2020).

As You may have noticed, this is not FixedEffectModel, but rather FixedEffectModelPyHDFE. The goal of this library is to reproduce the brilliant regHDFE Stata package on Python. To this end, the algorithm FEM used to calculate fixed effects has been replaced with PyHDFE, and a number of further changes have been made.

Presently, this package replicates regHDFE functionality for most use cases. For examples, please see tests/test_clustering.py.

If You find a regression whose output is different in FEMPyHDFE than what regHDFE produces please open an issue on this repo!

Installation

Install this package directly from PyPI

$ pip install FixedEffectModelPyHDFE

Limitations

The original FEM package includes functionality other than absorbing and clustering variables - for example it includes instrumental variable functionality. The focus of this package for the moment is solely on the absorption and clustering functions, so no guarantees on any other functionality.

Documentation

Documentation is provided by Kuaishou DA group here. Below is a copy of their README for convenience (and slight modification to reflect that PyHDFE is also being used in this package)

Main Functions

Function name Description Usage
ols_high_d_category get main result ols_high_d_category(data_df, consist_input=None, out_input=None, category_input=None, cluster_input=[],fake_x_input=[], iv_col_input=[], formula=None, robust=False, c_method='cgm', psdef=True, epsilon=1e-8, max_iter=1e6, process=5)
ols_high_d_category_multi_results get results of multiple models based on same dataset ols_high_d_category_multi_results(data_df, models, table_header)
getfe get fixed effects getfe(result, epsilon=1e-8)
alpha_std get standard error of fixed effects alpha_std(result, formula, sample_num=100)

Example

For a plethora of examples, please also see tests/test_clustering.py

import FixedEffectModelPyHDFE.api as FEM
import pandas as pd

df = pd.read_csv('path/to/yourdata.csv')

#define model
#you can define the model through defining formula like 'dependent variable ~ continuous variable|fixed_effect|clusters|(endogenous variables ~ instrument variables)'
formula_without_iv = 'y~x+x2|id+firm|id+firm'
formula_without_cluster = 'y~x+x2|id+firm|0|(Q|W~x3+x4+x5)'
formula = 'y~x+x2|id+firm|id+firm|(Q|W~x3+x4+x5)'
result1 = FEM.ols_high_d_category(df, formula = formula,robust=False,c_method = 'cgm',epsilon = 1e-8,psdef= True,max_iter = 1e6)

#or you can define the model through defining each part
# a.k.a. predictors
consist_input = ['x','x2']
# a.k.a. target
output_input = ['y']
# a.k.a. variables to be absorbed
category_input = ['id','firm']
cluster_input = ['id','firm']
endo_input = ['Q','W']
iv_input = ['x3','x4','x5']
c_method='cgm'
result1 = FEM.ols_high_d_category(df,consist_input,out_input,category_input,cluster_input,endo_input,iv_input,formula=None,robust=False,c_method = c_method,epsilon = 1e-8,max_iter = 1e6)

#show result
result1.summary()

Requirements

  • Python 3.6+
  • Pandas and its dependencies (Numpy, etc.)
  • Scipy and its dependencies
  • statsmodels and its dependencies
  • networkx
  • PyHDFE

Citation

If you use FixedEffectModel in your research, please cite the following:

Kuaishou DA Ecology. FixedEffectModel: A Python Package for Linear Model with High Dimensional Fixed Effects.https://github.com/ksecology/FixedEffectModel,2020.Version 0.x

BibTex:

@misc{FixedEffectModel,
  author={Kuaishou DA Ecology},
  title={{FixedEffectModel: {A Python Package for Linear Model with High Dimensional Fixed Effects}},
  howpublished={https://github.com/ksecology/FixedEffectModel},
  note={Version 0.x},
  year={2020}
}

Jeff Gortmaker and Anya Tarascina. PyHDFE: High Dimensional Fixed Effect Absorption.https://github.com/jeffgortmaker/pyhdfe,2019.Version 0.x

BibTex:

@misc{PyHDFE,
  author={Jeff Gortmaker with Anya Tarascina},
  title={{PyHDFE: {High Dimensional Fixed Effect Absorption},
  howpublished={https://github.com/jeffgortmaker/pyhdfe},
  note={Version 0.x},
  year={2019}
}

Feedback

This package welcomes feedback. If you have any additional questions or comments, please contact da_ecology@kuaishou.com.

Reference

[1] Simen Gaure(2019). lfe: Linear Group Fixed Effects. R package. version:v2.8-5.1 URL:https://www.rdocumentation.org/packages/lfe/versions/2.8-5.1

[2] A Colin Cameron and Douglas L Miller. A practitioner’s guide to cluster-robust inference. Journal of human resources, 50(2):317–372, 2015.

[3] Simen Gaure. Ols with multiple high dimensional category variables. Computational Statistics & Data Analysis, 66:8–18, 2013.

[4] Douglas L Miller, A Colin Cameron, and Jonah Gelbach. Robust inference with multi-way clustering. Technical report, Working Paper, 2009.

[5] Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FixedEffectModelPyHDFE-0.0.5.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

FixedEffectModelPyHDFE-0.0.5-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file FixedEffectModelPyHDFE-0.0.5.tar.gz.

File metadata

  • Download URL: FixedEffectModelPyHDFE-0.0.5.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5

File hashes

Hashes for FixedEffectModelPyHDFE-0.0.5.tar.gz
Algorithm Hash digest
SHA256 5cbcceeb8b9a2031e94b0b7bb5b58fb3b750054b7145c3ac8dd9d0aa20007cda
MD5 9d3469198c158a1b0960c7bcb06c83a0
BLAKE2b-256 d345aff4e7ba13ca6ea4c4f968a1d6640f2062847ce97e481eda52979fe0f804

See more details on using hashes here.

File details

Details for the file FixedEffectModelPyHDFE-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: FixedEffectModelPyHDFE-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5

File hashes

Hashes for FixedEffectModelPyHDFE-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2abda4c73f043419ea0a938ce63a435c2de42f464c7420306ae9bb37f92f4f75
MD5 250a1057a50ac0814037e6fe5a197229
BLAKE2b-256 38ee06a0a9a8a93015cfbcca2783d3e5c0368b63d0e804861756462ee69550ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page