Skip to main content

Python replication of Stata's shapley2: Shapley-Owen decomposition for regression fit statistics

Project description

pyshapley2

PyPI version Python 3.9+ License: MIT Tests

Python replication of Stata's shapley2 command (Chavez Juarez, 2013).

Computes the Shapley-Owen decomposition of any regression fit statistic (R², adjusted R², log-likelihood, AIC, …) across independent variables or user-defined variable groups, with optional parallel computation support.


Installation

pip install pyshapley2

For development:

pip install "pyshapley2[dev]"

Quick Start

import pandas as pd
from pyshapley2 import shapley2

# Sample data
df = pd.read_csv("your_data.csv")

# Basic R² decomposition
result = shapley2(df, depvar="wage", indepvars=["edu", "exp", "tenure"])
result.summary()

Output (1:1 replica of Stata's table format):

Shapley-Owen decomposition  |  depvar: wage  |  stat: r2  |  command: ols
Observations: 500  |  Subsets: 8  |  K=3

Factor     │ Shapley value │ Per cent  │Shapley value │  Per cent
           │  (estimate)   │(estimate) │ (normalized) │(normalized)
───────────┼───────────────┼───────────┼──────────────┼─────────────
edu        │       0.35420 │    51.23 % │      0.31876 │      46.12 %
exp        │       0.27816 │    40.25 % │      0.25034 │      36.21 %
tenure     │       0.05918 │     8.56 % │      0.05326 │       7.70 %
───────────┼───────────────┼───────────┼──────────────┼─────────────
Residual   │      -0.00204 │    -0.04 % │              │
───────────┼───────────────┼───────────┼──────────────┼─────────────
TOTAL      │       0.68954 │   100.00 % │      0.68954 │     100.00 %
───────────┼───────────────┼───────────┼──────────────┼─────────────

Features

All stat options

stat= Meaning Stata equivalent
"r2" e(r2)
"r2_a" Adjusted R² e(r2_a)
"ll" Log-likelihood e(ll)
"aic" AIC computed
"bic" BIC computed
"rmse" Root MSE computed

Custom extractor via stat_func:

result = shapley2(df, "y", ["x1", "x2", "x3"], stat_func=lambda r: r.rsquared)

All command options

command= Model Stata equivalent
"ols" / "reg" OLS regress
"logit" Logit logit
"probit" Probit probit
"poisson" Poisson poisson
"glm" GLM glm
callable Custom any e() command

Group decomposition (Stata group() option)

result = shapley2(
    df, "wage", ["edu", "exp", "tenure", "age"],
    stat="r2",
    groups={
        "Human Capital":  ["edu", "exp"],
        "Job Tenure":     ["tenure"],
        "Demographics":   ["age"],
    },
)
result.summary()

Parallel computation

# Use all available CPU cores
result = shapley2(
    df, "wage", ["x1", "x2", "x3", "x4", "x5"],
    stat="r2",
    n_jobs=-1,       # -1 = all cores; N = exactly N processes
    backend="loky",  # "loky" (default) | "threading" | "multiprocessing"
    verbose=1,       # show progress bar (requires tqdm)
)

When to use parallel?
Parallel is beneficial when K ≥ 10 (≥ 1,024 regressions).
For small K (≤ 8), the process-spawning overhead outweighs the benefit.

Visualization

fig, ax = result.plot(
    kind="norm_pct",   # "pct" | "norm_pct" | "shapley" | "norm"
    figsize=(8, 5),
)
fig.savefig("shapley_decomp.pdf", dpi=300)

Stata → Python mapping

Stata Python
shapley2, stat(r2) shapley2(df, "y", ["x1","x2"], stat="r2")
shapley2, stat(r2) command(logit) shapley2(..., stat="ll", command="logit")
shapley2, stat(r2) group(x1 x2, x3) shapley2(..., groups={"G1":["x1","x2"],"G2":["x3"]})
shapley2, stat(r2) force shapley2(..., force=True)
(not available in Stata) shapley2(..., n_jobs=-1)

Result object attributes

result.table           # pd.DataFrame: shapley, shapley_pct, shapley_norm, shapley_norm_pct
result.full_stat       # float: full-model stat (e.g. R²)
result.residual        # float: full_stat − sum(shapley)
result.K               # int: number of variables/groups
result.runs            # int: number of regressions run (2^K)
result.n_obs           # int: number of observations used
result.summary()       # prints Stata-style table, returns str
result.plot()          # matplotlib bar chart
result.to_dict()       # serializable dict

Algorithm

Shapley2 implements the Shapley-Owen regression decomposition (also known as the LMG method):

  1. Enumerate all 2^K subsets of K variables/groups.
  2. Regress the outcome on each subset; record the fit statistic.
  3. OLS (with intercept): regress the vector of fit statistics on the binary inclusion indicators; slope coefficients are the Shapley values.
  4. Normalize: compute four output forms (raw, relative %, normalized, normalized %).

This is a 1:1 algorithmic replication of Stata's shapley2 v1.1.


Validation against Stata

Results are verified to match Stata's shapley2 (v1.1) output to ≥ 5 decimal places on two public benchmark datasets.

Test 1 — mtcars (individual variables)

Data: Motor Trend Cars Road Tests (1974), N = 32 Model: regress mpg hp wt disp Stata: reg mpg hp wt dispshapley2, stat(r2)

Variable Shapley (est.) % (est.) Shapley (norm.) % (norm.)
hp 0.18805 22.74% 0.22511 27.23%
wt 0.27959 33.81% 0.33469 40.48%
disp 0.22307 26.98% 0.26704 32.30%
Residual 0.13612 16.46%
TOTAL 0.82684 100% 0.82684 100%
import pandas as pd
from pyshapley2 import shapley2

df = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv")
result = shapley2(df, "mpg", ["hp", "wt", "disp"], stat="r2")
result.summary()

Test 2 — Boston Housing (grouped variables)

Data: Boston Housing (Harrison & Rubinfeld, 1978), N = 506 Model: regress medv lstat rm dis ptratio Stata: reg medv lstat rm dis ptratioshapley2, stat(r2) group(lstat,rm,dis ptratio)

Group Variables Shapley (est.) % (est.) Shapley (norm.) % (norm.)
Group 1 lstat 0.29427 42.63% 0.31257 45.28%
Group 2 rm 0.23205 33.61% 0.24648 35.71%
Group 3 dis, ptratio 0.12358 17.90% 0.13126 19.01%
Residual 0.04041 5.85%
TOTAL 0.69031 100% 0.69031 100%
import pandas as pd
from pyshapley2 import shapley2

df = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Boston.csv")
result = shapley2(
    df, "medv", ["lstat", "rm", "dis", "ptratio"],
    stat="r2",
    groups={
        "lstat":       ["lstat"],
        "rm":          ["rm"],
        "dis_ptratio": ["dis", "ptratio"],
    },
)
result.summary()

References

  • Chavez Juarez, F. (2013). shapley2: Stata module to compute Shapley values from regressions. Statistical Software Components S457543, Boston College.
  • Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2, 307–317.
  • Owen, G. (1977). Values of games with a priori unions. Essays in Mathematical Economics and Game Theory, 76–88.
  • Kruskal, W. (1987). Relative importance by averaging over orderings. American Statistician, 41(1), 6–10.

License

MIT © 2026 luzhiyu-econ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyshapley2-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyshapley2-0.1.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file pyshapley2-0.1.1.tar.gz.

File metadata

  • Download URL: pyshapley2-0.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshapley2-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c6c01680d919fda352f914ca09b996235eee430b2ec2e41d12bbfaa43da1c41
MD5 be1f39cde7e23247c5044b4efca7b53b
BLAKE2b-256 ed8205a3a2d24baa65b3010082ccbc53048e5ddb23f0e18e81ef0c3f924aba1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshapley2-0.1.1.tar.gz:

Publisher: publish.yml on luzhiyu-econ/pyshapley2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyshapley2-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyshapley2-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshapley2-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b30ec2d3566899d6d6110a61d48db85de985e440544f09e1335001611128c5e
MD5 de5f7728651d32143a1e303ec8595207
BLAKE2b-256 ed7283bca4affb8295f3fe53e4254266937d580bca92c1d782f71e5db71deb50

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshapley2-0.1.1-py3-none-any.whl:

Publisher: publish.yml on luzhiyu-econ/pyshapley2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page