A comprehensive Python package for calculating Shapley values in cooperative game theory
Project description
Shapley Value Calculator
A comprehensive Python package for calculating Shapley values in cooperative game theory. Shapley values provide a mathematically principled method for fairly distributing payoffs among players based on their marginal contributions to coalitions.
📖 Table of Contents
- Overview
- Installation
- Quick Start
- Usage Examples
- API Reference
- Features
- Performance
- Testing
- Contributing
- License
🎯 Overview
The Shapley value is a solution concept from cooperative game theory that assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. This package provides four approaches:
| Class | When to use | Scales to |
|---|---|---|
ShapleyCombinations |
Pre-defined coalition values (exact) | ~15 players |
ShapleyValue |
Pre-defined coalition values, alt algorithm | ~15 players |
ShapleyValueCalculator |
Dynamic evaluation function (exact, parallel) | ~20 players |
MonteCarloShapleyValue |
Dynamic evaluation function (approximate, parallel) | 100+ players |
🚀 Installation
pip install shapley-value
⚡ Quick Start
Exact calculation from pre-defined coalition values
from shapley_value import ShapleyCombinations
players = ['Alice', 'Bob', 'Charlie']
coalition_values = {
(): 0,
('Alice',): 10, ('Bob',): 20, ('Charlie',): 30,
('Alice', 'Bob'): 50, ('Alice', 'Charlie'): 60, ('Bob', 'Charlie'): 70,
('Alice', 'Bob', 'Charlie'): 100,
}
calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
# {'Alice': 16.67, 'Bob': 33.33, 'Charlie': 50.0}
Monte Carlo approximation for large games
from shapley_value import MonteCarloShapleyValue
def my_model(coalition):
"""Any callable – ML model, simulation, or analytic function."""
return sum(coalition) ** 0.9 if coalition else 0.0
mc = MonteCarloShapleyValue(
my_model,
players=list(range(50)), # 50 players – exact is infeasible
num_samples=5000,
random_seed=42,
n_jobs=-1, # use all CPU cores
)
values = mc.calculate_shapley_values() # Dict[player, float]
📋 Usage Examples
Method 1 – Pre-defined coalition values (ShapleyCombinations)
Use this when every coalition value is already known:
from shapley_value import ShapleyCombinations
players = ['Player1', 'Player2', 'Player3']
coalition_values = {
(): 0,
('Player1',): 100, ('Player2',): 200, ('Player3',): 300,
('Player1', 'Player2'): 450,
('Player1', 'Player3'): 500,
('Player2', 'Player3'): 600,
('Player1', 'Player2', 'Player3'): 900,
}
calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
print(shapley_values)
Method 2 – Exact evaluation function (ShapleyValueCalculator)
Use this when a callable computes the coalition value and you have ≤ ~20 players:
from shapley_value import ShapleyValueCalculator
def profit_function(coalition):
if not coalition:
return 0
return sum(coalition) + len(coalition) * 10 # synergy bonus
players = [100, 200, 300]
calculator = ShapleyValueCalculator(profit_function, players, n_jobs=-1)
shapley_values = calculator.calculate_shapley_values()
print(shapley_values)
raw_data = calculator.get_raw_data() # detailed per-coalition data
calculator.save_raw_data('analysis.csv') # export to CSV
n_jobs follows the scikit-learn convention: 1 = sequential, -1 = all cores, k = exactly k cores.
Method 3 – Monte Carlo approximation (MonteCarloShapleyValue)
Use this for large games (20+ players) where exact computation is infeasible.
Complexity is O(m × n) where m = num_samples and n = number of players,
versus O(2ⁿ) for exact methods.
from shapley_value import MonteCarloShapleyValue
def complex_model(coalition):
"""Expensive callable – e.g. a trained ML model."""
if not coalition:
return 0.0
return float(sum(p ** 1.2 for p in coalition))
players = list(range(1, 51)) # 50 players
mc = MonteCarloShapleyValue(
complex_model,
players=players,
num_samples=5000, # more samples → lower error
random_seed=0, # set for reproducibility
n_jobs=-1, # parallelise across all CPU cores
)
# Core result
values = mc.calculate_shapley_values()
# Diagnose how many samples you need
convergence_df = mc.get_convergence_data() # DataFrame shape (5000, 50)
# Inspect per-permutation marginal contributions
raw_df = mc.get_raw_data() # columns: iteration, permutation, player, marginal_contribution
Choosing num_samples
| Players | Suggested num_samples |
Typical error |
|---|---|---|
| ≤ 10 | 1 000 | < 0.5 % |
| 10–30 | 5 000 | < 1 % |
| 30–100 | 10 000–50 000 | 1–3 % |
Use get_convergence_data() to plot running estimates and verify convergence
for your specific game.
n_jobs quick reference
# Sequential (default – no overhead, good for small games / debugging)
mc = MonteCarloShapleyValue(f, players, n_jobs=1)
# All available CPU cores
mc = MonteCarloShapleyValue(f, players, n_jobs=-1)
# Exactly 4 cores
mc = MonteCarloShapleyValue(f, players, n_jobs=4)
Note: Permutations are generated in a fixed order before the parallel step, so
random_seedguarantees bit-identical results regardless ofn_jobs.
📚 API Reference
ShapleyCombinations
class ShapleyCombinations:
def __init__(self, players: List[Any])
def calculate_shapley_values(
self, coalition_values: Dict[Tuple, float]
) -> Dict[Any, float]
def get_all_coalitions(self) -> List[Tuple]
ShapleyValueCalculator
class ShapleyValueCalculator:
def __init__(
self,
evaluation_function: Callable[[List[Any]], float],
players: List[Any],
n_jobs: int = 1, # sklearn-style: 1=seq (default), -1=all cores, k=k cores
)
def calculate_shapley_values(self) -> Dict[Any, float]
def get_raw_data(self) -> pd.DataFrame
def save_raw_data(self, file_path: str) -> None
MonteCarloShapleyValue
class MonteCarloShapleyValue:
def __init__(
self,
evaluation_function: Callable[[List[Any]], float],
players: List[Any],
num_samples: int = 1000, # permutations to sample
random_seed: Optional[int] = None,
n_jobs: int = 1, # sklearn-style parallelism
)
def calculate_shapley_values(self) -> Dict[Any, float]
def get_convergence_data(self) -> pd.DataFrame # running estimates per iteration
def get_raw_data(self) -> pd.DataFrame # per-permutation detail
ShapleyValue
Low-level calculator using the weighted-marginal-contribution formula directly.
class ShapleyValue:
def __init__(
self,
players: List[Any],
coalition_values: Dict[Tuple[Any, ...], float],
)
def calculate_shapley_values(self) -> Dict[Any, float]
✨ Features
- Four calculation methods – exact (pre-defined values, evaluation function) and Monte Carlo approximation for large games
- sklearn-style
n_jobson bothShapleyValueCalculatorandMonteCarloShapleyValue:1= sequential,-1= all cores,k= exactly k cores - Reproducible parallel results –
random_seedinMonteCarloShapleyValuegives bit-identical output regardless ofn_jobs - Convergence diagnostics –
get_convergence_data()lets you plot running estimates and tunenum_samplesfor your accuracy target - Data export –
get_raw_data()andsave_raw_data()for downstream analysis - Type flexibility – any hashable player type (strings, ints, tuples, …)
- Comprehensive test suite – 53 tests covering correctness, reproducibility, parallelism, edge cases, and stress / performance
⚡ Performance
Exact methods (ShapleyCombinations / ShapleyValueCalculator)
Exact computation enumerates all 2ⁿ coalitions and becomes impractical beyond ~20 players.
| Players | Coalitions | Sequential | Parallel (n_jobs=-1) |
Speedup |
|---|---|---|---|---|
| 5 | 32 | < 0.001 s | < 0.001 s | 1× |
| 10 | 1 024 | ~ 6.6 s | ~ 1.1 s | ~ 6× |
| 12 | 4 096 | ~ 31 s | ~ 2.6 s | ~ 12× |
| 15 | 32 768 | ~ 4 min | ~ 20 s | ~ 12× |
Monte Carlo (MonteCarloShapleyValue)
O(m × n) cost – scales to 100+ players. Parallelism is beneficial when the evaluation function is expensive (e.g. ML model inference, simulations).
| Players | num_samples |
n_jobs=1 |
n_jobs=-1 |
Notes |
|---|---|---|---|---|
| 20 | 5 000 | < 1 s | < 1 s | cheap eval function |
| 50 | 5 000 | < 1 s | < 1 s | cheap eval function |
| 50 | 10 000 | ~ 2 s | ~ 0.7 s | expensive eval (~1 ms/call) |
| 100 | 10 000 | ~ 5 s | ~ 1.5 s | expensive eval (~1 ms/call) |
Rule of thumb: if your evaluation function is cheap (< 10 µs), use
n_jobs=1to avoid joblib process-spawn overhead. If it's expensive (ML model, simulation), setn_jobs=-1for maximum throughput.
🧪 Testing
# Install dev dependencies
pip install -e ".[dev,examples,performance]"
# Run the full test suite (53 tests)
python -m pytest tests/ -v
# Run only stress / performance tests (with printed benchmark table)
python -m pytest tests/test_montecarlo_stress.py -v -s
The suite is organised into:
| File | Tests | Coverage |
|---|---|---|
test_calculator.py |
11 | ShapleyValue, ShapleyCombinations, ShapleyValueCalculator |
test_montecarlo.py |
29 | Correctness, reproducibility, parallelism, edge cases |
test_montecarlo_stress.py |
13 | 20–100 player stress, timing bounds, benchmark table |
🤝 Contributing
We welcome contributions!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
git clone https://github.com/Bowenislandsong/shapley-value.git
cd shapley-value
pip install -e ".[dev,examples,performance]"
python -m pytest tests/
📄 License
This project is licensed under the MIT License – see the LICENSE file for details.
📝 Citation
If you use this package in your research or project, please cite it as:
@software{song2026shapley,
author = {Song, Bowen},
title = {Shapley Value Calculator},
year = {2026},
publisher = {GitHub},
url = {https://github.com/Bowenislandsong/shapley-value},
version = {0.0.9}
}
APA Format:
Song, B. (2026). Shapley Value Calculator (Version 0.0.9) [Computer software]. https://github.com/Bowenislandsong/shapley-value
For more citation formats see CITATION.cff.
🏷️ Version History
- 0.0.9: PyPI publish on
v*tag push; citation / version metadata - 0.0.8:
ShapleyValueCalculatorusesn_jobs(sklearn-style); doc and landing-page sync; example runner includes Monte Carlo example - 0.0.7:
MonteCarloShapleyValuewith sklearn-stylen_jobs; comprehensive stress tests; convergence and raw-data diagnostics - 0.0.6: Citation information for academic use
- 0.0.5: CI improvements and Python 3.13 support
- 0.0.4: Comprehensive examples, parallel processing, performance optimizations
- 0.0.3: Enhanced parallel processing
- 0.0.2: Function-based evaluation and data export
- 0.0.1: Initial release
👥 Authors
- Bowen Song – Profile
🔗 Links
For more information about Shapley values and cooperative game theory, see the Wikipedia article.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shapley_value-0.0.9.tar.gz.
File metadata
- Download URL: shapley_value-0.0.9.tar.gz
- Upload date:
- Size: 52.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f973beb943c8fdbc08009b7e8b6955a9a4e29e901578f48a0f134acd62a2059
|
|
| MD5 |
76ca1728bc8b0eacb0800f3096e16277
|
|
| BLAKE2b-256 |
70b90686b55f70b46e443c5538bebf497534e16ac1955b6c871f85469ca16893
|
File details
Details for the file shapley_value-0.0.9-py3-none-any.whl.
File metadata
- Download URL: shapley_value-0.0.9-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0d59f34c0ec9598fa8893ed3103ed1ee567797b98ecc3a07c9dc0f6d9314c55
|
|
| MD5 |
aa4f550e53c11cb29845c57c0906f128
|
|
| BLAKE2b-256 |
cda7d765ab952bb47a504f21ad728872c16c2719d7489819b49767f2dce15fe2
|