A Stochastic Frontier Analysis (SFA) library featuring MLE and Bayesian (PyMC) estimations.
Project description
Depp_sfa: stochastic frontier analysis
Depp_sfa is a python library dedicated to the estimation of stochastic frontier analysis (sfa) models.
It is designed to provide high robustness against the numerical convergence issues frequently encountered in applied econometrics. For cross-sectional data, the library relies primarily on maximum likelihood estimation (mle), featuring an automatic fallback to bayesian estimation (mcmc via pymc) in the event of optimization failure. For panel data, it implements a strictly bayesian estimation of the dynamic battese and coelli (1992) model.
Main features
- Production and cost frontiers: supports both orientations.
- Functional forms: linear, cobb-douglas, and translog specifications. Includes support for dummy variables.
- Cross-sectional data: standard estimation with optional inclusion of inefficiency determinants (bc95 model).
- Panel data (time-varying): implementation of the battese and coelli (1992) model, capturing the temporal evolution of technical inefficiency.
- True effects (greene 2005): integration of unobserved heterogeneity separation.
- Efficiency decomposition methods: jondrow et al. (1982), battese and coelli (1988), and a modified approach.
Mle versus pymc
The library offers two distinct inference methods to adapt to your data structure and overcome traditional solver limitations.
Mle (maximum likelihood estimation) is the standard frequentist approach. It is computationally fast and works well on large, balanced cross-sectional datasets. However, it frequently suffers from boundary failures (where the inefficiency variance collapses to zero) on unbalanced panels or when dealing with high ratios of singletons.
Pymc (bayesian inference) is the robust alternative. By integrating prior distributions, it prevents boundary collapses and successfully separates signal from noise even when the majority of the panel consists of single observations. It provides full posterior distributions but requires more computation time.
Installation
The library can be installed directly from its git repository:
pip install depp_sfa
Usage and data preparation
It is strongly recommended to standardize (center and scale) continuous variables prior to estimation to ensure the convergence of the optimization algorithms.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from depp_SFA import SFA, FUN_COST
# 1. data preparation
df = pd.read_csv("data.csv")
df = df[(df["cost"] > 0) & (df["output"] > 0)]
# standardization
vars_to_scale = ["density", "quality_index"]
scaler = StandardScaler()
df[vars_to_scale] = scaler.fit_transform(df[vars_to_scale])
# 2. vector extraction
y = df["cost"].to_numpy(dtype=float)
X = df[["output"] + vars_to_scale].to_numpy(dtype=float)
firm_ids = df["firm_id"].to_numpy()
years = df["year"].to_numpy(dtype=float)
Z_vars = df[["contract_type", "vandalism_risk"]].to_numpy(dtype=float)
Implementation guide
1. Panel data with time evolution (bc92)
The battese and coelli (1992) model captures how inefficiency evolves over time using a decay parameter. It is best estimated using pymc for short or unbalanced panels.
model_bc92 = SFA(
y=y,
x=X,
id_var=firm_ids,
time_var=years,
fun=FUN_COST,
panel_model='bc92',
inference_method='pymc', # or mle
draws=2000
)
# display results
model_bc92.summary()
# retrieve efficiency scores
efficiency_scores = model_bc92.get_technical_efficiency()
2. Inefficiency effects model (bc95)
The battese and coelli (1995) model uses environmental variables (z) to explain why certain units are less efficient, rather than treating these variables as direct cost drivers.
model_bc95 = SFA(
y=y,
x=X,
z=Z_vars,
fun=FUN_COST,
inference_method='pymc', #or mle
draws=2000
)
# display results
model_bc95.summary()
3. True random effects (greene 2005)
This specification separates unobserved structural heterogeneity (specific to each firm) from pure managerial inefficiency. It prevents the model from penalizing firms for structural geographical disadvantages.
model_greene = SFA(
y=y,
x=X,
id_var=firm_ids,
time_var=years,
fun=FUN_COST,
panel_model='greene',
inference_method='pymc', # or mle
draws=2000
)
# display results
model_greene.summary()
Api documentation (sfa class)
Instantiation: sfa(...)
The class constructor configures the model parameters and transforms the data according to the specified functional form.
Parameters:
- y (array-like): dependent variable (output for production frontier, cost for cost frontier).
- x (array-like, 2d): independent variables (inputs or prices/outputs).
- z (array-like, 2d, optional): explanatory variables for inefficiency (cross-sectional data only).
- id_var (array-like, optional): individual identifiers for panel data.
- time_var (array-like, optional): time variable for panel data.
- fun (constant): frontier type. Use sfa.fun_prod (default) or sfa.fun_cost.
- intercept (bool): include an intercept in the model (default: true).
- lamda0 (float): initial value of lambda for mle optimization (default: 1).
- method (constant): method for computing technical efficiency. Choose among sfa.te_tej (jondrow et al.), sfa.te_te (battese and coelli), or sfa.te_temod.
- form (str): functional form. Choose among 'linear', 'cobb_douglas', or 'translog'.
- dummy_indices (list): list of column indices in x that are indicator variables (0/1) and should not be log-transformed.
Public methods
Once the model is instantiated, the following methods are available. Calling any of these methods will automatically trigger the estimation process (optimize()) if it has not already been executed.
- optimize(): triggers the estimation algorithm. Automatically routes to panel estimation (mcmc) or cross-sectional estimation (mle, with an mcmc fallback in case of convergence failure).
- summary(): computes and prints a summary table of the results to the console, including estimated coefficients, standard errors, t-values, p-values, associated significance levels, and the log-likelihood value (for mle).
- get_beta(): returns a numpy array containing the estimated coefficients for the frontier variables (including the intercept if intercept=true).
- get_residuals(): returns a numpy array containing the model residuals.
- get_sigma2(): returns the estimated total variance of the composite error.
- get_lambda(): returns the ratio of standard deviations, which measures the relative importance of inefficiency compared to statistical noise.
- get_technical_efficiency(): returns a numpy array containing the technical efficiency scores (bounded between 0 and 1) computed for each observation in the sample, using the method specified during instantiation.
Credits and licenses
This software is distributed under the mit license.
- The base architecture, matrix processing, and classic log-likelihood derivations are inspired by and adapted from the work of sheng dai (copyright (c) 2023, mit license).
- The integration of bayesian estimators (markov chain monte carlo via pymc), the numerical stabilization of panel models (battese and coelli 1992), and the algorithmic exception handling were developed specifically for this project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file depp_sfa-0.1.1.tar.gz.
File metadata
- Download URL: depp_sfa-0.1.1.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda5af419041e140bb7ce28aea03fe3649841ac69d0e385c6d6926df35ff4877
|
|
| MD5 |
606c2c813dafc8643851b9a4cb04e91e
|
|
| BLAKE2b-256 |
ed180354c8792343757614fd17e02e3b3fafabc0bc220bb4e1ced6de69eb56aa
|
File details
Details for the file depp_sfa-0.1.1-py3-none-any.whl.
File metadata
- Download URL: depp_sfa-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
182169de5f123aff0a3a9fda64ca381c5116b39d84f917ecd7f3aa246085e6fd
|
|
| MD5 |
226b5e55f9220f9551eff471f7322935
|
|
| BLAKE2b-256 |
646574caa4db1786a486a69c0d70d92519b94e5f36aaec40c326c4436ca39b9e
|