Skip to main content

Bayesian models for football leagues

Project description

bpl

Build Status codecov Downloads

bpl is a python 3 library for fitting Bayesian versions of the Dixon & Coles (1997) model to data. It uses the stan library to fit models to data.

Installation

pip install bpl

Usage

bpl provides a class BPLModel that can be used to forecast the outcome of football matches. Data should be provided to the model as a pandas dataframe, with columns home_team, away_team, home_goals and away_goals. You can also optionally provide a set of numerical covariates for each team (e.g. their ratings on FIFA) and these will be used in the fit. Example usage:

import bpl
import pandas as pd

df_train = pd.read_csv("<path-to-training-data>")
df_X = pd.read_csv("<path-to-team-level-covariates>")
forecaster = bpl.BPLModel(data=df_train, X=df_X)
forecaster.fit(seed=42)

# calculate the probability that team 1 beats team 2 3-0 at home:
forecaster.score_probability("Team 1", "Team 2", 3, 0)

# calculate the probabilities of a home win, an away win and a draw:
forecaster.overall_probabilities("Team 1", "Team 2")

# compute home win, away win and draw probabilities for a collection of matches:
df_test = pd.read_csv("<path-to-test-data>") # must have columns "home_team" and "away_team"
forecaster.predict_future_matches(df_test)

# add a new, previously unseen team to the model by sampling from the prior
X_3 = np.array([0.1, -0.5, 3.0]) # the covariates for the new team
forecaster.add_new_team("Team 3", X=X_3, seed=43)

Statistical model

The statistical model behind bpl is a slight variation on the Dixon & Coles approach. The likelihood is:

equation

where y_h and y_a are the number of goals scored by the home team and the away team, respectively. a_i is the attacking aptitude of team i and b_i is the defending aptitude of team j. gamma_i represents the home advantage for team i, and tau is a correlation term that was introduced by Dixon and Coles to produce more realistic scorelines in low-scoring matches. The model uses the following bivariate, hierarchical prior for a and b

equation

X_i are a set of (optional) team-level covariates (these could be, for example, the attack and defence ratings of team i on Fifa). beta are coefficient vectors, and mu_b is an offset for the defence parameter. rho encodes the correlation between a and b, since teams that are strong at attacking also tend to be strong at defending as well. The home advantage has a log-normal prior

equation

Finally, the hyper-priors are

equation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bpl-0.1.0.tar.gz (29.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page