A Python toolkit for estimating firm-level markups using production function-based marginal cost recovery.
Project description
PyMarkup
A Python toolkit for estimating firm-level markups using production function-based marginal cost recovery.
Installation
git clone https://github.com/immortalsRDJ/PyMarkup
cd PyMarkup
uv sync
For WRDS data downloads, add the wrds extra:
uv sync --extra wrds
Quick Start
Option 1: Command Line (Recommended)
# 1. Set up config file
cp config.example.yaml config.yaml
# Edit config.yaml with your API keys and settings
# 2. Run the full pipeline
uv run pymarkup run-all --config config.yaml
# Or skip data download if you already have the data
uv run pymarkup run-all --config config.yaml --skip-download
Option 2: Python Script
from PyMarkup import MarkupPipeline, PipelineConfig, EstimatorConfig
config = PipelineConfig(
compustat_path="Input/DLEU/Compustat_annual.csv",
macro_vars_path="Input/DLEU/macro_vars_new.xlsx",
estimator=EstimatorConfig(method="wooldridge_iv"),
)
pipeline = MarkupPipeline(config)
results = pipeline.run()
results.save(output_dir="Output/", format="csv")
Command Line Reference
Full Pipeline
# Run everything (download + estimate + figures)
uv run pymarkup run-all --config config.yaml
# Skip all downloads (use existing data)
uv run pymarkup run-all --config config.yaml --skip-download
# Skip only Compustat download (no WRDS credentials needed)
uv run pymarkup run-all --config config.yaml --skip-compustat
# Skip figure generation
uv run pymarkup run-all --config config.yaml --no-figures
# Verbose output for debugging
uv run pymarkup run-all --config config.yaml -v
Individual Commands
# Download data only
uv run pymarkup download ppi # PPI (no credentials needed)
uv run pymarkup download cpi --config config.yaml # CPI (needs FRED API key)
uv run pymarkup download all --config config.yaml # All datasets
# Run estimation only (requires existing data)
uv run pymarkup estimate --config config.yaml
# Validate input data
uv run pymarkup validate Input/DLEU/Compustat_annual.csv
# Check version
uv run pymarkup version
Configuration
Setting Up Credentials
-
Copy the example config file:
cp config.example.yaml config.yaml
-
Edit
config.yamlwith your credentials:fred_api_key: "your-fred-api-key" wrds_username: "your-wrds-username"
Alternatively, set environment variables: FRED_API_KEY, WRDS_USERNAME
Data Requirements
| Data Source | Credentials | How to Get |
|---|---|---|
| Compustat (WRDS) | WRDS account | Register at WRDS |
| CPI (FRED) | FRED API key | Free at FRED |
| PPI (BLS) | None | Public data from BLS |
| Macro variables | N/A | Included in repo: Input/DLEU/macro_vars_new.xlsx |
| NAICS descriptions | N/A | Included in repo: Input/Other/NAICS_2D_Description.xlsx |
| DEU observations | N/A | Optional: Original DLEU paper firm-year sample (see below) |
Pipeline Overview
Download -> Data Preparation -> Elasticity Estimation -> Markup Calculation -> Figures & Decomposition
1. Data Download
Downloads raw data from external sources:
from PyMarkup.data import download_compustat, download_cpi, download_ppi, load_config
config = load_config("config.yaml")
download_ppi(config) # No credentials needed
download_cpi(config) # Requires FRED API key
download_compustat(config) # Requires WRDS credentials
Data Sources:
- PPI: Bureau of Labor Statistics Producer Price Index data from https://download.bls.gov/pub/time.series/pc/
- CPI: Federal Reserve Economic Data (FRED) Consumer Price Index
- Compustat: WRDS Compustat Fundamentals Annual/Quarterly
2. Data Preparation
Cleans and prepares the Compustat panel:
- Deduplicates firm-year observations
- Extracts NAICS industry codes
- Deflates monetary values by GDP
- Computes market shares
- Trims outliers
3. Elasticity Estimation
Estimates output elasticity of variable inputs (θ) at the industry-year level:
| Method | Class | Use Case |
|---|---|---|
| Wooldridge IV | WooldridgeIVEstimator |
Main method, addresses endogeneity via IV/2SLS |
| Cost Share | CostShareEstimator |
Fast baseline, no regression needed |
| ACF | ACFEstimator |
Robustness, two-stage GMM with control function |
from PyMarkup.estimators import WooldridgeIVEstimator
estimator = WooldridgeIVEstimator(specification="spec2")
elasticities = estimator.estimate_elasticities(panel_data)
SG&A Configuration
All three estimators support including SG&A (Selling, General & Administrative expenses) as a third input in the production function:
| Estimator | Parameter | Options | Default |
|---|---|---|---|
| Wooldridge IV | specification |
"spec1" (COGS+K), "spec2" (COGS+K+SG&A) |
"spec2" |
| Cost Share | include_sga |
True, False |
False |
| ACF | include_sga |
True, False |
False |
from PyMarkup.estimators import ACFEstimator, CostShareEstimator, WooldridgeIVEstimator
# Wooldridge IV: use spec2 for 3-input (COGS + Capital + SG&A)
iv_est = WooldridgeIVEstimator(specification="spec2")
# Cost Share: include SG&A in cost share calculation
cs_est = CostShareEstimator(include_sga=True)
# ACF: include SG&A as third input
acf_est = ACFEstimator(include_sga=True)
Via pipeline config:
from PyMarkup import PipelineConfig, EstimatorConfig
config = PipelineConfig(
compustat_path="Input/DLEU/Compustat_annual.csv",
macro_vars_path="Input/DLEU/macro_vars_new.xlsx",
estimator=EstimatorConfig(
method="all",
iv_specification="spec2", # Wooldridge IV with SG&A
cs_include_sga=True, # Cost Share with SG&A
acf_include_sga=True, # ACF with SG&A
),
)
Aggregation Weights
When aggregating firm-level markups to industry or economy level, you can choose the weighting scheme:
| Weight Type | Formula | Use Case |
|---|---|---|
"revenue" (default) |
firm_revenue / total_revenue |
Standard approach, larger firms weighted more |
"cost" |
firm_cogs / total_cogs |
Weight by production scale |
from PyMarkup.core.markup_calculation import aggregate_markups
# Revenue-weighted aggregation (default)
agg = aggregate_markups(
firm_markups, by="year", method="weighted_mean",
weight_type="revenue", panel_data=panel_data
)
# Cost-weighted aggregation
agg = aggregate_markups(
firm_markups, by="year", method="weighted_mean",
weight_type="cost", panel_data=panel_data
)
Via pipeline config:
config = PipelineConfig(
...
aggregation_weight="revenue", # or "cost"
)
DEU Sample Filtering
To replicate the original De Loecker, Eeckhout, and Unger (2020) paper results, you can filter the Compustat data to only include the firm-year observations from the original study:
# config.yaml
use_deu_sample: true
deu_observations_path: "Input/DLEU/DEU_observations.dta"
Or via Python:
config = PipelineConfig(
compustat_path="Input/DLEU/Compustat_annual.csv",
macro_vars_path="Input/DLEU/macro_vars_new.xlsx",
use_deu_sample=True,
deu_observations_path="Input/DLEU/DEU_observations.dta",
...
)
When enabled, the pipeline performs an inner merge on gvkey and year to filter to the original DLEU sample (approximately 242,000 firm-year observations from 1955-2016).
4. Markup Calculation
Computes firm-level markups using the De Loecker & Warzynski formula:
markup = θ / cost_share
where cost_share = COGS / Revenue
5. Figures
| Figure | Function | Description |
|---|---|---|
| Aggregate Markup | plot_aggregate_markup() |
Time series of aggregate markups |
| PPI vs Markup | plot_markup_vs_ppi() |
Scatter plot with weighted OLS regression |
6. Decomposition
Dynamic Olley-Pakes decomposition of aggregate markup changes (DLEU 2020). The decomposition runs automatically in the pipeline for Wooldridge IV and Cost Share methods.
Decomposes markup growth into three components:
| Component | Description |
|---|---|
| Within | Markup changes within continuing firms |
| Reallocation | Market share shifts toward high/low-markup firms |
| Net Entry | Difference between entering and exiting firms |
The components sum to the total markup change: Within + Reallocation + Net Entry = Markup (benchmark)
Output files:
| File | Description |
|---|---|
Output/intermediate/decomposition_wooldridge_iv.csv |
IV decomposition results |
Output/intermediate/decomposition_cost_share.csv |
Cost Share decomposition results |
Output/figures/Decomposition - Wooldridge IV (YYYY-YYYY).pdf |
IV decomposition figure |
Output/figures/Decomposition - Cost Share (YYYY-YYYY).pdf |
Cost Share decomposition figure |
Standalone usage:
from PyMarkup.decomposition import OlleyPakesDecomposition, plot_decomposition
op = OlleyPakesDecomposition(
firm_var="gvkey",
time_var="year",
markup_var="markup",
weight_var="sale_D",
)
decomp_results = op.decompose(firm_markups)
# Plot with cumulative markup levels (DLEU Figure IV style)
# All lines start at the same baseline and show counterfactual paths:
# "What would markup be if only this component operated?"
plot_decomposition(
decomp_results,
cumulative=True,
base_markup=1.21, # Base period aggregate markup (e.g., 1980 value)
save_path="Output/decomposition.pdf",
)
Project Structure
src/PyMarkup/
├── core/ # Data preparation, markup calculation, figures
├── data/ # Data downloaders and loaders
├── estimators/ # WooldridgeIV, CostShare, ACF estimators
├── pipeline/ # MarkupPipeline orchestrator, config
├── decomposition/ # Dynamic Olley-Pakes decomposition
├── io/ # I/O schemas (Pydantic)
└── cli/ # CLI commands
Input/ # Raw data (not version controlled)
Intermediate/ # Generated datasets, theta estimates
Output/ # Figures and tables
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymkp-0.1.0.tar.gz.
File metadata
- Download URL: pymkp-0.1.0.tar.gz
- Upload date:
- Size: 104.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6eda564c2cd97b6f538aa0ff2c7e556ac8c9b2525676bf239382687fd4ac4303
|
|
| MD5 |
ed19718bb23c415238a6c124ee12d0bc
|
|
| BLAKE2b-256 |
79d944db131200b991b6f091d4350381fa3d5eb7c209804d43f2ca53c914ab66
|
File details
Details for the file pymkp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pymkp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 111.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2da60585833e02ec0934e980bad6c5fd33a7e0473e1be6c3ef0b9c71877bbc4
|
|
| MD5 |
064e751d7c1b941384a536d90bdb4a6b
|
|
| BLAKE2b-256 |
28341e24019815e077ac1ce06307a06006b896eb29fabb1a3296cc0ff52a6d94
|