Skip to main content

Estimate two way fixed effect labor models

Project description

PyTwoWay

https://badge.fury.io/py/pytwoway.svg https://anaconda.org/tlamadon/pytwoway/badges/version.svg https://anaconda.org/tlamadon/pytwoway/badges/platforms.svg https://circleci.com/gh/tlamadon/pytwoway/tree/master.svg?style=shield https://img.shields.io/badge/doc-latest-blue https://badgen.net/badge//gh/pytwoway?icon=github

PyTwoWay is the Python package associated with the following paper:

How Much Should we Trust Estimates of Firm Effects and Worker Sorting?” by Stéphane Bonhomme, Kerstin Holzheu, Thibaut Lamadon, Elena Manresa, Magne Mogstad, and Bradley Setzler. No. w27368. National Bureau of Economic Research, 2020.

The package provides implementations for a series of estimators for models with two sided heterogeneity:

  1. two way fixed effect estimator as proposed by Abowd, Kramarz, and Margolis

  2. homoskedastic bias correction as in Andrews, et al.

  3. heteroskedastic bias correction as in Kline, Saggio, and Sølvsten

  4. group fixed estimator as in Bonhomme, Lamadon, and Manresa

  5. group correlated random effect as presented in the main paper

  6. fixed-point revealed preference estimator as in Sorkin

  7. non-parametric sorting estimator as in Borovičková and Shimer

If you want to give it a try, you can start an example notebook for the FE estimator here: binder_fe for the CRE estimator here: binder_cre for the BLM estimator here: binder_blm for the Sorkin estimator here: binder_sorkin and for the Borovickova-Shimer estimator here: binder_bs. These start fully interactive notebooks with simple examples that simulate data and run the estimators.

The package provides a Python interface. Installation is handled by pip or Conda (TBD). The source of the package is available on GitHub at PyTwoWay. The online documentation is hosted here.

The code is relatively efficient. A benchmark below compares PyTwoWay’s speed with that of LeaveOutTwoWay, a MATLAB package for estimating AKM and its bias corrections.

Quick Start

To install via pip, from the command line run:

pip install pytwoway

To make sure you are running the most up-to-date version of PyTwoWay, from the command line run:

pip install --upgrade pytwoway

Please DO NOT download the Conda version of the package, as it is outdated!

Help with Running the Package

Please check out the documentation for detailed examples of how to use PyTwoWay. If you have a question that the documentation doesn’t answer, please also check the past Issues to see if someone else has already asked this question and an answer has been provided. If you still can’t find an answer, please open a new Issue and we will try to answer as quickly as possible.

Benchmarking

Data is simulated from BipartitePandas using the following code:

import numpy as np
import bipartitepandas as bpd

sim_params = bpd.sim_params({'n_workers': 500000, 'firm_size': 10, 'p_move': 0.05})
rng = np.random.default_rng(1234)

sim_data = bpd.SimBipartite(sim_params).simulate(rng)

This data is then estimated using the PyTwoWay class FEEstimator and using the MATLAB package LeaveOutTwoWay. For estimation using PyTwoWay, all estimators other than AMG use the incomplete Cholesky decomposition as a preconditioner.

Results are estimated on a 2021 MacBook Pro 14” with 16 GB Ram and an Apple M1 Pro processor with 8 cores.

Some summary statistics about the largest leave-one-match-out set:

Package

#obs

#firms

#movers

KSS

2,255,370

44,510

88,542

PyTwoWay

2,269,665

44,601

89,098

Run time:

Solver

Cleaning

Estimation

Total

KSS

N/A

N/A

55.2s

PYTW-AMG

4.0s

3m2s

3m6s

PYTW-BICG

4.0s

20.4s

24.4s

PYTW-BICGSTAB

4.0s

21.9s

25.9s

PYTW-CG

4.0s

19.6s

23.6s

PYTW-CGS

4.0s

20.6s

24.6s

PYTW-GMRES

4.0s

32.9s

36.9s

PYTW-MINRES

4.0s

10.7s

14.7s

PYTW-QMR

4.0s

3m53s

3m57s

Contributing to the Package

If you want to contribute to the package, the easiest way is to test that it’s working properly! If you notice a part of the package is giving incorrect results, please add a new post in Issues and we will do our best to fix it as soon as possible.

We are also happy to consider any suggestions to improve the package and documentation, whether to add a new feature, make a feature more user-friendly, or make the documentation clearer. Please also post suggestions in Issues.

Finally, if you would like to help with developing the package, please make a fork of the repository and submit pull requests with any changes you make! These will be promptly reviewed, and hopefully accepted!

We are extremely grateful for all contributions made by the community!

Dependencies

Solving large sparse linear models relies on a combination of PyAMG (this is the package we use to estimate the different decompositions on US data) and SciPy’s iterative sparse linear solvers.

Many tools for handling sparse matrices come from SciPy.

Additional preconditioners for linear solvers come from PyMatting (installing the package is not required, as the necessary files have been copied into the submodule preconditioners). The incomplete Cholesky preconditioner in turn relies on Numba.

Constrained optimization is handled by QPSolvers.

Progress bars are generated with tqdm.

Parameter dictionaries are constructed using ParamsDict.

Data cleaning is handled by BipartitePandas.

We also rely on a number of standard libraries, such as NumPy, Pandas, matplotlib, etc.

Optionally, the code is compatible with: - multiprocess. Installing this may help if multiprocessing is raising errors related to pickling objects. - PyTorch. This may speed up BLM estimation, and adds the option to compute some operations using the GPU.

Citation

Please use following citation to cite PyTwoWay in academic publications:

Bibtex entry:

@techreport{bhlmms2020,
  title={How Much Should We Trust Estimates of Firm Effects and Worker Sorting?},
  author={Bonhomme, St{\'e}phane and Holzheu, Kerstin and Lamadon, Thibaut and Manresa, Elena and Mogstad, Magne and Setzler, Bradley},
  year={2020},
  institution={National Bureau of Economic Research}
}

Authors

Thibaut Lamadon, Assistant Professor in Economics, University of Chicago, lamadon@uchicago.edu

Adam A. Oppenheimer, Research Professional, University of Chicago, oppenheimer@uchicago.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytwoway-0.3.21.tar.gz (164.2 kB view details)

Uploaded Source

Built Distribution

pytwoway-0.3.21-py3-none-any.whl (172.1 kB view details)

Uploaded Python 3

File details

Details for the file pytwoway-0.3.21.tar.gz.

File metadata

  • Download URL: pytwoway-0.3.21.tar.gz
  • Upload date:
  • Size: 164.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for pytwoway-0.3.21.tar.gz
Algorithm Hash digest
SHA256 92a0a1a218ec43789cab8df5e88bf83e1d2a2765344f6ec3c57b0c997baee7f0
MD5 1c64171d7fc70d23475ad2b8d5bf0a99
BLAKE2b-256 239f6d148b1568ce81513a49c86beed42155b130836882b362a4eae74ad6e437

See more details on using hashes here.

File details

Details for the file pytwoway-0.3.21-py3-none-any.whl.

File metadata

  • Download URL: pytwoway-0.3.21-py3-none-any.whl
  • Upload date:
  • Size: 172.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for pytwoway-0.3.21-py3-none-any.whl
Algorithm Hash digest
SHA256 df15efaf8c9777f3bcafccbf8ed6ba1beea8837602be40eb31e5a3eaf7603be2
MD5 2c4274d9f7e5e8c23708d9d6b90ed6e3
BLAKE2b-256 384923ddd1a1eec217608ee46929618fa34efa8f05800230763165b72716ed15

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page