Skip to main content

Library from http://pena.lt/y/blog for modelling and working with football (soccer) data

Project description

Penalty Blog

The penaltyblog package contains code from http://pena.lt/y/blog for working with football (soccer) data.

Installation

pip install penaltyblog

Example

There are examples of all the functions available in the Examples section.

Download Data from football-data.co.uk

penaltyblog contains some helper functions for downloading data from football-data.co.uk.

List the countries available

import penaltyblog as pb
pd.footballdata.list_countries()
['belgium',
 'england',
 'france',
 'germany',
 'greece',
 'italy',
 'portugal',
 'scotland',
 'spain',
 'turkey']

Fetch the data

The first parameter is the country of interest, the second is the starting year of the season and the third paramater is the level of the division of interest, where 0 is the highest division (e.g. England's Premier League), 1 is the second highest (e.g. England's Championship) etc.

df = pb.footballdata.fetch_data("england", 2018, 0)
df[["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG"]].head()
Date HomeTeam AwayTeam FTHG FTAG
0 2018-08-10 00:00:00 Man United Leicester 2 1
1 2018-08-11 00:00:00 Bournemouth Cardiff 2 0
2 2018-08-11 00:00:00 Fulham Crystal Palace 0 2
3 2018-08-11 00:00:00 Huddersfield Chelsea 0 3
4 2018-08-11 00:00:00 Newcastle Tottenham 1 2

Predicting Goals

penaltyblog contains models designed for predicting the number of goals scored in football (soccer) games. Although aimed at football (soccer), they may also be useful for other sports, such as hockey.

The Basic Poisson Model

Let's start off by downloading some example scores from the awesome football-data website.

import penaltyblog as pb
df = pb.footballdata.fetch_data("England", 2018, 0)
df[["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG"]].head()
Date HomeTeam AwayTeam FTHG FTAG
0 2018-08-10 00:00:00 Man United Leicester 2 1
1 2018-08-11 00:00:00 Bournemouth Cardiff 2 0
2 2018-08-11 00:00:00 Fulham Crystal Palace 0 2
3 2018-08-11 00:00:00 Huddersfield Chelsea 0 3
4 2018-08-11 00:00:00 Newcastle Tottenham 1 2

Next, we create a basic Poisson model and fit it to the data.

pois = pb.PoissonGoalsModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])
pois.fit()

Let's take a look at the fitted parameters.

pois
Module: Penaltyblog

Model: Poisson

Number of parameters: 42
Log Likelihood: -1065.077
AIC: 2214.154

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.362                -1.062              
Bournemouth          1.115                -0.761              
Brighton             0.634                -0.937              
Burnley              0.894                -0.801              
Cardiff              0.614                -0.798              
Chelsea              1.202                -1.341              
Crystal Palace       1.004                -1.045              
Everton              1.055                -1.184              
Fulham               0.626                -0.637              
Huddersfield         0.184                -0.712              
Leicester            0.999                -1.145              
Liverpool            1.532                -1.889              
Man City             1.598                -1.839              
Man United           1.249                -1.013              
Newcastle            0.805                -1.153              
Southampton          0.891                -0.846              
Tottenham            1.264                -1.337              
Watford              1.03                 -0.937              
West Ham             1.026                -1.007              
Wolves               0.916                -1.191              
------------------------------------------------------------
Home Advantage: 0.225
Intercept: 0.206

The Dixon and Coles Adjustment

The basic Poisson model struggles somewhat with the probabilities for low scoring games. Dixon and Coles (1997) added in an adjustment factor (rho) that modifies the probabilities for 0-0, 1-0 and 0-1 scorelines to account for this.

dc = pb.DixonColesGoalModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])
dc.fit()
dc
Module: Penaltyblog

Model: Dixon and Coles

Number of parameters: 43
Log Likelihood: -1064.943
AIC: 2215.886

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.36                 -0.982              
Bournemouth          1.115                -0.679              
Brighton             0.632                -0.858              
Burnley              0.897                -0.717              
Cardiff              0.615                -0.715              
Chelsea              1.205                -1.254              
Crystal Palace       1.007                -0.961              
Everton              1.054                -1.102              
Fulham               0.625                -0.557              
Huddersfield         0.18                 -0.631              
Leicester            0.996                -1.064              
Liverpool            1.534                -1.803              
Man City             1.599                -1.762              
Man United           1.251                -0.931              
Newcastle            0.806                -1.07               
Southampton          0.897                -0.761              
Tottenham            1.259                -1.261              
Watford              1.031                -0.854              
West Ham             1.023                -0.927              
Wolves               0.914                -1.113              
------------------------------------------------------------
Home Advantage: 0.225
Intercept: 0.124
Rho: -0.041

The Rue and Salvesen Adjustment

Rue and Salvesen (1999) added in an additional psycological effect factor (gamma) where Team A will under-estimate Team B if Team A is stronger than team B. They also truncate scorelines to a maximum of five goals, e.g. a score of 7-3 becomes 5-3, stating that any goals above 5 are non-informative.

rs = pb.RueSalvesenGoalModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])

rs.fit()
rs
Module: Penaltyblog

Model: Rue Salvesen

Number of parameters: 44
Log Likelihood: -1061.167
AIC: 2210.334

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.435                -1.068              
Bournemouth          1.2                  -0.776              
Brighton             0.594                -0.831              
Burnley              0.935                -0.766              
Cardiff              0.6                  -0.712              
Chelsea              1.194                -1.281              
Crystal Palace       1.019                -0.985              
Everton              1.044                -1.126              
Fulham               0.641                -0.585              
Huddersfield         0.096                -0.573              
Leicester            0.988                -1.067              
Liverpool            1.487                -1.768              
Man City             1.533                -1.743              
Man United           1.315                -1.006              
Newcastle            0.761                -1.036              
Southampton          0.921                -0.814              
Tottenham            1.244                -1.274              
Watford              1.067                -0.902              
West Ham             1.045                -0.961              
Wolves               0.881                -1.091              
------------------------------------------------------------
Home Advantage: 0.222
Intercept: 0.141
Rho: -0.04
Gamma: 0.373

Making Predictions

To make a prediction using any of the above models, just pass the name of the home and away teams to the predict function. This returns the FootballProbabilityGrid class that can convert the output from the model into probabilities for various betting markets.

probs = my_model.predict("Liverpool", "Stoke")

Home / Draw / Away

# also known as 1x2
probs.home_draw_away
[0.5193995875820345, 0.3170596913687951, 0.1635407210315597]

Total Goals

probs.total_goals("over", 2.5)
0.31911650768322447
probs.total_goals("under", 2.5)
0.680883492299145

Asian Handicaps

probs.asian_handicap("home", 1.5)
0.2602616248461783
probs.asian_handicap("away", -1.5)
0.7397383751361912

Model Parameters

You can access the model's parameters via the get_params function.

from pprint import pprint
params = my_model.get_params()
pprint(params)
{'attack_Arsenal': 1.3650671020694474,
 'attack_Aston Villa': 0.6807140182913024,
 'attack_Blackburn': 0.971135574781119,
 'attack_Bolton': 0.9502712140456423,
 'attack_Chelsea': 1.235466344414206,
 'attack_Everton': 0.9257685468926837,
 'attack_Fulham': 0.9122902202053228,
 'attack_Liverpool': 0.8684673939949753,
 'attack_Man City': 1.543379586931267,
 'attack_Man United': 1.4968564161865994,
 'attack_Newcastle': 1.1095636706231062,
 'attack_Norwich': 1.0424304866584615,
 'attack_QPR': 0.827439335780754,
 'attack_Stoke': 0.6248927873330669,
 'attack_Sunderland': 0.8510292333101492,
 'attack_Swansea': 0.8471368133406263,
 'attack_Tottenham': 1.2496040004504756,
 'attack_West Brom': 0.8625207332372105,
 'attack_Wigan': 0.8177807129177644,
 'attack_Wolves': 0.8181858085358248,
 'defence_Arsenal': -1.2192247076852236,
 'defence_Aston Villa': -1.0566859588325535,
 'defence_Blackburn': -0.7430288162188969,
 'defence_Bolton': -0.7268011436918458,
 'defence_Chelsea': -1.2065700516830344,
 'defence_Everton': -1.3564763976122773,
 'defence_Fulham': -1.1159544166204092,
 'defence_Liverpool': -1.3293118049518535,
 'defence_Man City': -1.6549894606952225,
 'defence_Man United': -1.5728126940204685,
 'defence_Newcastle': -1.1186158411320268,
 'defence_Norwich': -0.8865413401238464,
 'defence_QPR': -0.9124617361500764,
 'defence_Stoke': -1.0766419199030601,
 'defence_Sunderland': -1.2049421203955355,
 'defence_Swansea': -1.1077243368907703,
 'defence_Tottenham': -1.3160823704397775,
 'defence_West Brom': -1.1014569193066301,
 'defence_Wigan': -0.932997180492951,
 'defence_Wolves': -0.6618461794219439,
 'home_advantage': 0.2655860528422758,
 'intercept': 0.23467961435272489,
 'rho': -0.1375912978446625,
 'rue_salvesen': 0.1401430558820631}

Implied Probabilities

Removes the overround and gets the implied probabilities from odds via a variety of methods

Multiplicative

Normalizes the probabilites so they sum to 1.0 by dividing the inverse of the odds by the sum of the inverse of the odds

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.multiplicative(odds)
{'implied_probabilities': array([0.35873804, 0.42112726, 0.2201347 ]),
 'margin': 0.03242570633874986,
 'method': 'multiplicative'}

Additive

Normalizes the probabilites so they sum to 1.0 by removing an equal amount from each

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.additive(odds)
{'implied_probabilities': array([0.3595618 , 0.42397404, 0.21646416]),
 'margin': 0.03242570633874986,
 'method': 'additive'}

Power

Solves for the power coefficient that normalizes the inverse of the odds to sum to 1.0

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.power(odds)
{'implied_probabilities': array([0.3591711 , 0.42373075, 0.21709815]),
 'margin': 0.03242570633874986,
 'method': 'power',
 'k': 1.0309132393169356}

Shin

Uses the Shin (1992, 1993) method to calculate the implied probabilities

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.shin(odds)
{'implied_probabilities': array([0.35934392, 0.42324385, 0.21741223]),
 'margin': 0.03242570633874986,
 'method': 'shin',
 'z': 0.016236442857291165}

Differential Margin Weighting

Uses the differential margin approach described by Joesph Buchdahl in his wisdom of the crowds article

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.differential_margin_weighting(odds)
{'implied_probabilities': array([0.3595618 , 0.42397404, 0.21646416]),
 'margin': 0.03242570633874986,
 'method': 'differential_margin_weighting'}

Odds Ratio

Uses Keith Cheung's odds ratio approach, as discussed by Joesph Buchdahl's in his wisdom of the crowds article, to calculate the implied probabilities

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.odds_ratio(odds)
{'implied_probabilities': array([0.35881036, 0.42256142, 0.21862822]),
 'margin': 0.03242570633874986,
 'method': 'odds_ratio',
 'c': 1.05116912729218}

Rank Probability Scores

Based on Constantinou and Fenton (2021), penaltyblog contains a function for calculating Rank Probability Scores for assessing home, draw, away probability forecasts.

predictions is a list of home, draw, away probabilities and observed is the zero-based index for which outcome actually occurred.

import penaltyblog as pb

predictions = [
    [1, 0, 0],
    [0.9, 0.1, 0],
    [0.8, 0.1, 0.1],
    [0.5, 0.25, 0.25],
    [0.35, 0.3, 0.35],
    [0.6, 0.3, 0.1],
    [0.6, 0.25, 0.15],
    [0.6, 0.15, 0.25],
    [0.57, 0.33, 0.1],
    [0.6, 0.2, 0.2],
]

observed = [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]

for p, o in zip(predictions, observed):
    rps = pb.rps(p, o)
    print(round(rps, 4))
0.0
0.005
0.025
0.1562
0.1225
0.185
0.0913
0.1113
0.0975
0.1

Download ELO rating from clubelo.com

Download ELO ratings for a given date

import penaltyblog as pb
df = pb.clubelo.fetch_rankings_by_date(2010, 1, 1)
df.head()
Rank Club Country Level Elo From To
0 1 Barcelona ESP 1 1987.68 2009-12-18 00:00:00 2010-01-02 00:00:00
1 2 Chelsea ENG 1 1945.54 2009-12-29 00:00:00 2010-01-16 00:00:00
2 3 Man United ENG 1 1928.53 2009-12-31 00:00:00 2010-01-09 00:00:00
3 4 Real Madrid ESP 1 1902.72 2009-12-20 00:00:00 2010-01-03 00:00:00
4 5 Inter ITA 1 1884.49 2009-12-21 00:00:00 2010-01-06 00:00:00

List all teams with ratings available

import penaltyblog as pb
teams = pb.clubelo.list_all_teams()
teams[:5]
['Man City', 'Bayern', 'Liverpool', 'Real Madrid', 'Man United']

Download Historical ELO ratings for a given team

import penaltyblog as pb
df = pb.clubelo.fetch_rankings_by_team("barcelona")
df.head()
Rank Club Country Level Elo From To
0 None Barcelona ESP 1 1636.7 1939-10-22 00:00:00 1939-12-03 00:00:00
1 None Barcelona ESP 1 1626.1 1939-12-04 00:00:00 1939-12-10 00:00:00
2 None Barcelona ESP 1 1636.73 1939-12-11 00:00:00 1939-12-17 00:00:00
3 None Barcelona ESP 1 1646.95 1939-12-18 00:00:00 1939-12-24 00:00:00
4 None Barcelona ESP 1 1637.42 1939-12-25 00:00:00 1939-12-31 00:00:00

References

  • Mark J. Dixon and Stuart G. Coles (1997) Modelling Association Football Scores and Inefficiencies in the Football Betting Market.
  • Håvard Rue and Øyvind Salvesen (1999) Prediction and Retrospective Analysis of Soccer Matches in a League.
  • Anthony C. Constantinou and Norman E. Fenton (2012) Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models
  • Hyun Song Shin (1992) Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias
  • Hyun Song Shin (1993) Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims
  • Joseph Buchdahl (2015) The Wisdom of the Crowd

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

penaltyblog-0.1.1.tar.gz (19.6 kB view hashes)

Uploaded Source

Built Distribution

penaltyblog-0.1.1-py3-none-any.whl (21.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page