Python replication of Stata's shapley2: Shapley-Owen decomposition for regression fit statistics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ZhiyuLu

These details have not been verified by PyPI

Project description

pyshapley2

Python replication of Stata's shapley2 command (Chavez Juarez, 2013).

Computes the Shapley-Owen decomposition of any regression fit statistic (R², adjusted R², log-likelihood, AIC, …) across independent variables or user-defined variable groups, with optional parallel computation support.

Installation

# Core (serial only)
pip install pyshapley2

# With parallel support (recommended)
pip install "pyshapley2[parallel]"

# With all optional features
pip install "pyshapley2[all]"

Optional extras:

Extra	Installs	Needed for
`parallel`	`joblib`	`n_jobs != 1`
`plot`	`matplotlib`	`.plot()`
`progress`	`tqdm`	`verbose=1`
`all`	all of above	everything
`dev`	above + pytest, ruff	development

Quick Start

import pandas as pd
from pyshapley2 import shapley2

# Sample data
df = pd.read_csv("your_data.csv")

# Basic R² decomposition
result = shapley2(df, depvar="wage", indepvars=["edu", "exp", "tenure"])
result.summary()

Output (1:1 replica of Stata's table format):

Shapley-Owen decomposition  |  depvar: wage  |  stat: r2  |  command: ols
Observations: 500  |  Subsets: 8  |  K=3

Factor     │ Shapley value │ Per cent  │Shapley value │  Per cent
           │  (estimate)   │(estimate) │ (normalized) │(normalized)
───────────┼───────────────┼───────────┼──────────────┼─────────────
edu        │       0.35420 │    51.23 % │      0.31876 │      46.12 %
exp        │       0.27816 │    40.25 % │      0.25034 │      36.21 %
tenure     │       0.05918 │     8.56 % │      0.05326 │       7.70 %
───────────┼───────────────┼───────────┼──────────────┼─────────────
Residual   │      -0.00204 │    -0.04 % │              │
───────────┼───────────────┼───────────┼──────────────┼─────────────
TOTAL      │       0.68954 │   100.00 % │      0.68954 │     100.00 %
───────────┼───────────────┼───────────┼──────────────┼─────────────

Features

All `stat` options

`stat=`	Meaning	Stata equivalent
`"r2"`	R²	`e(r2)`
`"r2_a"`	Adjusted R²	`e(r2_a)`
`"ll"`	Log-likelihood	`e(ll)`
`"aic"`	AIC	computed
`"bic"`	BIC	computed
`"rmse"`	Root MSE	computed

Custom extractor via stat_func:

result = shapley2(df, "y", ["x1", "x2", "x3"], stat_func=lambda r: r.rsquared)

All `command` options

`command=`	Model	Stata equivalent
`"ols"` / `"reg"`	OLS	`regress`
`"logit"`	Logit	`logit`
`"probit"`	Probit	`probit`
`"poisson"`	Poisson	`poisson`
`"glm"`	GLM	`glm`
callable	Custom	any `e()` command

Group decomposition (Stata `group()` option)

result = shapley2(
    df, "wage", ["edu", "exp", "tenure", "age"],
    stat="r2",
    groups={
        "Human Capital":  ["edu", "exp"],
        "Job Tenure":     ["tenure"],
        "Demographics":   ["age"],
    },
)
result.summary()

Parallel computation

# Use all available CPU cores
result = shapley2(
    df, "wage", ["x1", "x2", "x3", "x4", "x5"],
    stat="r2",
    n_jobs=-1,       # -1 = all cores; N = exactly N processes
    backend="loky",  # "loky" (default) | "threading" | "multiprocessing"
    verbose=1,       # show progress bar (requires tqdm)
)

When to use parallel?
Parallel is beneficial when K ≥ 10 (≥ 1,024 regressions).
For small K (≤ 8), the process-spawning overhead outweighs the benefit.

Visualization

fig, ax = result.plot(
    kind="norm_pct",   # "pct" | "norm_pct" | "shapley" | "norm"
    figsize=(8, 5),
)
fig.savefig("shapley_decomp.pdf", dpi=300)

Stata → Python mapping

Stata	Python
`shapley2, stat(r2)`	`shapley2(df, "y", ["x1","x2"], stat="r2")`
`shapley2, stat(r2) command(logit)`	`shapley2(..., stat="ll", command="logit")`
`shapley2, stat(r2) group(x1 x2, x3)`	`shapley2(..., groups={"G1":["x1","x2"],"G2":["x3"]})`
`shapley2, stat(r2) force`	`shapley2(..., force=True)`
(not available in Stata)	`shapley2(..., n_jobs=-1)`

Result object attributes

result.table           # pd.DataFrame: shapley, shapley_pct, shapley_norm, shapley_norm_pct
result.full_stat       # float: full-model stat (e.g. R²)
result.residual        # float: full_stat − sum(shapley)
result.K               # int: number of variables/groups
result.runs            # int: number of regressions run (2^K)
result.n_obs           # int: number of observations used
result.summary()       # prints Stata-style table, returns str
result.plot()          # matplotlib bar chart
result.to_dict()       # serializable dict

Algorithm

Shapley2 implements the Shapley-Owen regression decomposition (also known as the LMG method):

Enumerate all 2^K subsets of K variables/groups.
Regress the outcome on each subset; record the fit statistic.
OLS (with intercept): regress the vector of fit statistics on the binary inclusion indicators; slope coefficients are the Shapley values.
Normalize: compute four output forms (raw, relative %, normalized, normalized %).

This is a 1:1 algorithmic replication of Stata's shapley2 v1.1.

Validation against Stata

Results are verified to match Stata's shapley2 (v1.1) output to ≥ 5 decimal places on two public benchmark datasets.

Test 1 — mtcars (individual variables)

Data: Motor Trend Cars Road Tests (1974), N = 32 Model: regress mpg hp wt disp Stata: reg mpg hp wt disp → shapley2, stat(r2)

Variable	Shapley (est.)	% (est.)	Shapley (norm.)	% (norm.)
hp	0.18805	22.74%	0.22511	27.23%
wt	0.27959	33.81%	0.33469	40.48%
disp	0.22307	26.98%	0.26704	32.30%
Residual	0.13612	16.46%	—	—
TOTAL	0.82684	100%	0.82684	100%

import pandas as pd
from pyshapley2 import shapley2

df = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv")
result = shapley2(df, "mpg", ["hp", "wt", "disp"], stat="r2")
result.summary()

Test 2 — Boston Housing (grouped variables)

Data: Boston Housing (Harrison & Rubinfeld, 1978), N = 506 Model: regress medv lstat rm dis ptratio Stata: reg medv lstat rm dis ptratio → shapley2, stat(r2) group(lstat,rm,dis ptratio)

Group	Variables	Shapley (est.)	% (est.)	Shapley (norm.)	% (norm.)
Group 1	lstat	0.29427	42.63%	0.31257	45.28%
Group 2	rm	0.23205	33.61%	0.24648	35.71%
Group 3	dis, ptratio	0.12358	17.90%	0.13126	19.01%
Residual	—	0.04041	5.85%	—	—
TOTAL	—	0.69031	100%	0.69031	100%

import pandas as pd
from pyshapley2 import shapley2

df = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Boston.csv")
result = shapley2(
    df, "medv", ["lstat", "rm", "dis", "ptratio"],
    stat="r2",
    groups={
        "lstat":       ["lstat"],
        "rm":          ["rm"],
        "dis_ptratio": ["dis", "ptratio"],
    },
)
result.summary()

References

Chavez Juarez, F. (2013). shapley2: Stata module to compute Shapley values from regressions. Statistical Software Components S457543, Boston College.
Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2, 307–317.
Owen, G. (1977). Values of games with a priori unions. Essays in Mathematical Economics and Game Theory, 76–88.
Kruskal, W. (1987). Relative importance by averaging over orderings. American Statistician, 41(1), 6–10.

License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ZhiyuLu

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Apr 18, 2026

This version

0.1.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyshapley2-0.1.0.tar.gz (15.2 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyshapley2-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file pyshapley2-0.1.0.tar.gz.

File metadata

Download URL: pyshapley2-0.1.0.tar.gz
Upload date: Apr 18, 2026
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshapley2-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`137576556a24c62db449efdd0f2836c5871fb38b7ab64c057f71b2efa11e666e`
MD5	`8b5529daac877290a700de18a9b55444`
BLAKE2b-256	`54fe5a45c6a21478540f7d8862f878156045f6e2c5a68a51f22b4e1db672bb9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshapley2-0.1.0.tar.gz:

Publisher: publish.yml on luzhiyu-econ/pyshapley2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyshapley2-0.1.0.tar.gz
- Subject digest: 137576556a24c62db449efdd0f2836c5871fb38b7ab64c057f71b2efa11e666e
- Sigstore transparency entry: 1337940879
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: luzhiyu-econ/pyshapley2@090861f76dddc63e4507b1be15214c77b5a5c5d1
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/luzhiyu-econ
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@090861f76dddc63e4507b1be15214c77b5a5c5d1
- Trigger Event: push

File details

Details for the file pyshapley2-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyshapley2-0.1.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 15.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshapley2-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb0560cb69e619350a4f0e3de10a64a55bff72f5ffa6582a745101ac29dbf97e`
MD5	`605813c4682b63435b7c77b733cc4506`
BLAKE2b-256	`aac57a51f5684825ee3b773784e817d6fc7351d9fbb02dc9baa1e1051045423b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshapley2-0.1.0-py3-none-any.whl:

Publisher: publish.yml on luzhiyu-econ/pyshapley2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyshapley2-0.1.0-py3-none-any.whl
- Subject digest: bb0560cb69e619350a4f0e3de10a64a55bff72f5ffa6582a745101ac29dbf97e
- Sigstore transparency entry: 1337940961
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: luzhiyu-econ/pyshapley2@090861f76dddc63e4507b1be15214c77b5a5c5d1
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/luzhiyu-econ
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@090861f76dddc63e4507b1be15214c77b5a5c5d1
- Trigger Event: push

pyshapley2 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

pyshapley2

Installation

Quick Start

Features

All stat options

All command options

Group decomposition (Stata group() option)

Parallel computation

Visualization

Stata → Python mapping

Result object attributes

Algorithm

Validation against Stata

Test 1 — mtcars (individual variables)

Test 2 — Boston Housing (grouped variables)

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

All `stat` options

All `command` options

Group decomposition (Stata `group()` option)