SSAT: Statistical Sports Analysis Toolkit
Project description
SSAT: Statistical Sports Analysis Toolkit
SSAT is a Python package implementing statistical models for sports analytics. The package provides a collection of frequentist and Bayesian statistical models for analyzing and predicting sports match outcomes.
Key Features
-
Multiple Statistical Models:
- Frequentist Models:
- Bradley-Terry Model: Paired comparison model for team rankings
- TOOR (Team Offense-Offense Rating): Offensive performance analysis
- GSSD (Goal Scoring Statistical Distribution): Goal distribution modeling
- ZSD (Zero-Score Distribution): Special case handling for 0-0 outcomes
- PRP (Possession-based Rating Process): Team rating based on possession metrics
- Poisson Model: Classic goal-scoring probability distribution
- Bayesian Models:
- Bayesian Poisson: Probabilistic goal-scoring model with team attack/defense capabilities
- Negative Binomial: Overdispersed goal-scoring model for higher variance
- Zero-Inflated Negative Binomial: Handles excess zeros in low-scoring matches
- Skellam: Direct modeling of goal differences
- Zero-Inflated Skellam: Enhanced modeling of draws and low-scoring games
- Frequentist Models:
-
Data Processing: Integrated with flashscore-scraper for automated data collection
-
Visualization: Comprehensive plotting utilities for model analysis
-
Model Comparison: Tools for comparing predictions across different models
Installation
pip install ssat
For full functionality including all optional dependencies:
pip install ssat[all]
Dependencies
- Core: numpy, pandas, scipy
- Optional:
- Development: ipykernel, ipywidgets, jupyter
- Visualization: matplotlib, seaborn
- Data Collection: flashscore-scraper, requests, beautifulsoup4
- Machine Learning: scikit-learn, statsmodels
- Bayesian (planned): arviz, cmdstanpy
Quick Start
import pandas as pd
from ssat.frequentist import BradleyTerry, Poisson
# Load data
df = pd.read_pickle("ssat/data/sample_handball_data.pkl")
X = df[["home_team", "away_team"]]
Z = df[["home_goals", "away_goals"]]
y = df["spread"]
# Initialize and fit models
bt_model = BradleyTerry()
poisson_model = Poisson()
# Fit models
bt_model.fit(X, y, Z)
poisson_model.fit(X, y, Z)
# Make predictions
bt_predictions = bt_model.predict(X)
poisson_predictions = poisson_model.predict(X)
# Predict probabilities
bt_probas = bt_model.predict_proba(X, Z, point_spread=0, include_draw=True)
poisson_probas = poisson_model.predict_proba(X, Z, point_spread=0, include_draw=True)
Bayesian Models
SSAT's Bayesian models provide probabilistic predictions with uncertainty quantification. These models use MCMC sampling via Stan to estimate team strengths and predict match outcomes.
Available Models
-
Bayesian Poisson
from ssat.bayesian import Poisson model = Poisson() model.fit(X) # X contains [home_team, away_team, home_goals, away_goals] # Make predictions predictions = model.predict(new_matches) probabilities = model.predict_proba(new_matches, point_spread=0) # Visualize team strengths model.plot_team_stats()
-
Negative Binomial
from ssat.bayesian import NegBinom model = NegBinom() model.fit(X)
Better suited for matches with higher scoring variance.
-
Zero-Inflated Models
from ssat.bayesian import NegBinomZero, SkellamZero model = NegBinomZero() # or SkellamZero() model.fit(X)
Specifically designed for competitions with frequent low scores or draws.
Model Features
- Uncertainty Quantification: All predictions include credible intervals
- Team Statistics: Analyze attack and defense capabilities per team
- Visualization Tools:
# View MCMC diagnostics model.plot_trace() # Analyze team strengths model.plot_team_stats()
- Flexible Predictions:
- Win/Draw/Loss probabilities
- Expected goals
- Custom point spread predictions
Example Usage
import pandas as pd
from ssat.bayesian import Poisson
# Load match data
matches = pd.DataFrame({
'home_team': ['TeamA', 'TeamB', ...],
'away_team': ['TeamB', 'TeamC', ...],
'home_goals': [2, 1, ...],
'away_goals': [1, 2, ...]
})
# Initialize and fit model
model = Poisson()
model.fit(matches)
# Predict new matches
new_matches = pd.DataFrame({
'home_team': ['TeamA'],
'away_team': ['TeamC']
})
# Get win probabilities
probs = model.predict_proba(new_matches)
print(f"Win probability: {probs[0]:.2%}")
# Analyze team strengths
model.plot_team_stats()
Data Sources
Match data is collected using the flashscore-scraper package. The package includes sample handball data in ssat/data/sample_handball_data.pkl for testing and examples.
API Documentation
Base Model
All models inherit from BaseModel providing common functionality:
fit(X, y, Z): Fit the model to training datapredict(X): Predict match outcomespredict_proba(X, Z, point_spread, include_draw): Predict outcome probabilities
Specific Models
Bradley-Terry Model
from ssat.frequentist import BradleyTerry
model = BradleyTerry()
model.fit(X, y, Z)
Implements paired comparison modeling for team strength estimation.
Poisson Model
from ssat.frequentist import Poisson
model = Poisson()
model.fit(X, y, Z)
Models goal-scoring as a Poisson process.
[Additional model documentation available in the wiki]
Development Roadmap
-
Current Release (v0.0.1):
- Frequentist models implementation
- Basic data processing utilities
- Example notebooks
-
Upcoming Features:
- Bayesian implementations using Stan
- Enhanced visualization tools
- Additional sport-specific models
- Performance optimization
-
Future Plans:
- Real-time prediction updates
- Web API integration
- Additional sports support
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
git clone https://github.com/bjrnsa/ssat.git
cd ssat
pip install -e ".[all]"
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use SSAT in your research, please cite:
@software{ssat2025,
author = {Aagaard, Bjørn},
title = {SSAT: Statistical Sports Analysis Toolkit},
year = {2025},
publisher = {GitHub},
url = {https://github.com/bjrnsa/ssat}
}
Acknowledgments
- Andrew Mack's "Statistical Sports Models in Excel" (ISBN: 978-1079013450)
- Contributors and maintainers of dependent packages
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssat-0.0.2.tar.gz.
File metadata
- Download URL: ssat-0.0.2.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acf1de912d6f2ec11c03828c9018f32982799643ea0c18870cb84e98c58d40a1
|
|
| MD5 |
df71545d9a05b6ae344baf985308907a
|
|
| BLAKE2b-256 |
09122edd7e7763dcc0fc3f58667a04d8a49fa0dec102b2fd502d199e6facf775
|
File details
Details for the file ssat-0.0.2-py3-none-any.whl.
File metadata
- Download URL: ssat-0.0.2-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
751d74a6187dab7d7de15f406c2928cb83e4aac138f4bf12ab8c6199f748b797
|
|
| MD5 |
4fc96e3492b822c7ff9c4d04d3507a8e
|
|
| BLAKE2b-256 |
62e0f1a5bdda90cc3be9df90b53c91f10c87ea8484e78c0c3643ae1965c00dc3
|