Library from http://pena.lt/y/blog for modelling and working with football (soccer) data

These details have not been verified by PyPI

Project links

Project description

Penalty Blog

The penaltyblog package contains code from http://pena.lt/y/blog for working with football (soccer) data.

Installation

pip install penaltyblog

Example

There are examples of all the functions available in the Examples section.

Download Data from football-data.co.uk

penaltyblog contains some helper functions for downloading data from football-data.co.uk.

List the countries available

import penaltyblog as pb
pd.footballdata.list_countries()

['belgium',
 'england',
 'france',
 'germany',
 'greece',
 'italy',
 'portugal',
 'scotland',
 'spain',
 'turkey']

Fetch the data

The first parameter is the country of interest, the second is the starting year of the season and the third paramater is the level of the division of interest, where 0 is the highest division (e.g. England's Premier League), 1 is the second highest (e.g. England's Championship) etc.

df = pb.footballdata.fetch_data("england", 2018, 0)
df[["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG"]].head()

	Date	HomeTeam	AwayTeam	FTHG	FTAG
0	2018-08-10 00:00:00	Man United	Leicester	2	1
1	2018-08-11 00:00:00	Bournemouth	Cardiff	2	0
2	2018-08-11 00:00:00	Fulham	Crystal Palace	0	2
3	2018-08-11 00:00:00	Huddersfield	Chelsea	0	3
4	2018-08-11 00:00:00	Newcastle	Tottenham	1	2

Predicting Goals

penaltyblog contains models designed for predicting the number of goals scored in football (soccer) games. Although aimed at football (soccer), they may also be useful for other sports, such as hockey.

The Basic Poisson Model

Let's start off by downloading some example scores from the awesome football-data website.

import penaltyblog as pb
df = pb.footballdata.fetch_data("England", 2018, 0)
df[["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG"]].head()

	Date	HomeTeam	AwayTeam	FTHG	FTAG
0	2018-08-10 00:00:00	Man United	Leicester	2	1
1	2018-08-11 00:00:00	Bournemouth	Cardiff	2	0
2	2018-08-11 00:00:00	Fulham	Crystal Palace	0	2
3	2018-08-11 00:00:00	Huddersfield	Chelsea	0	3
4	2018-08-11 00:00:00	Newcastle	Tottenham	1	2

Next, we create a basic Poisson model and fit it to the data.

pois = pb.PoissonGoalsModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])
pois.fit()

Let's take a look at the fitted parameters.

pois

Module: Penaltyblog

Model: Poisson

Number of parameters: 42
Log Likelihood: -1065.077
AIC: 2214.154

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.362                -1.062              
Bournemouth          1.115                -0.761              
Brighton             0.634                -0.937              
Burnley              0.894                -0.801              
Cardiff              0.614                -0.798              
Chelsea              1.202                -1.341              
Crystal Palace       1.004                -1.045              
Everton              1.055                -1.184              
Fulham               0.626                -0.637              
Huddersfield         0.184                -0.712              
Leicester            0.999                -1.145              
Liverpool            1.532                -1.889              
Man City             1.598                -1.839              
Man United           1.249                -1.013              
Newcastle            0.805                -1.153              
Southampton          0.891                -0.846              
Tottenham            1.264                -1.337              
Watford              1.03                 -0.937              
West Ham             1.026                -1.007              
Wolves               0.916                -1.191              
------------------------------------------------------------
Home Advantage: 0.225
Intercept: 0.206

The Dixon and Coles Adjustment

The basic Poisson model struggles somewhat with the probabilities for low scoring games. Dixon and Coles (1997) added in an adjustment factor (rho) that modifies the probabilities for 0-0, 1-0 and 0-1 scorelines to account for this.

dc = pb.DixonColesGoalModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])
dc.fit()
dc

Module: Penaltyblog

Model: Dixon and Coles

Number of parameters: 43
Log Likelihood: -1064.943
AIC: 2215.886

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.36                 -0.982              
Bournemouth          1.115                -0.679              
Brighton             0.632                -0.858              
Burnley              0.897                -0.717              
Cardiff              0.615                -0.715              
Chelsea              1.205                -1.254              
Crystal Palace       1.007                -0.961              
Everton              1.054                -1.102              
Fulham               0.625                -0.557              
Huddersfield         0.18                 -0.631              
Leicester            0.996                -1.064              
Liverpool            1.534                -1.803              
Man City             1.599                -1.762              
Man United           1.251                -0.931              
Newcastle            0.806                -1.07               
Southampton          0.897                -0.761              
Tottenham            1.259                -1.261              
Watford              1.031                -0.854              
West Ham             1.023                -0.927              
Wolves               0.914                -1.113              
------------------------------------------------------------
Home Advantage: 0.225
Intercept: 0.124
Rho: -0.041

The Rue and Salvesen Adjustment

Rue and Salvesen (1999) added in an additional psycological effect factor (gamma) where Team A will under-estimate Team B if Team A is stronger than team B. They also truncate scorelines to a maximum of five goals, e.g. a score of 7-3 becomes 5-3, stating that any goals above 5 are non-informative.

rs = pb.RueSalvesenGoalModel(
    df["FTHG"], df["FTAG"], df["HomeTeam"], df["AwayTeam"])

rs.fit()
rs

Module: Penaltyblog

Model: Rue Salvesen

Number of parameters: 44
Log Likelihood: -1061.167
AIC: 2210.334

Team                 Attack               Defence             
------------------------------------------------------------
Arsenal              1.435                -1.068              
Bournemouth          1.2                  -0.776              
Brighton             0.594                -0.831              
Burnley              0.935                -0.766              
Cardiff              0.6                  -0.712              
Chelsea              1.194                -1.281              
Crystal Palace       1.019                -0.985              
Everton              1.044                -1.126              
Fulham               0.641                -0.585              
Huddersfield         0.096                -0.573              
Leicester            0.988                -1.067              
Liverpool            1.487                -1.768              
Man City             1.533                -1.743              
Man United           1.315                -1.006              
Newcastle            0.761                -1.036              
Southampton          0.921                -0.814              
Tottenham            1.244                -1.274              
Watford              1.067                -0.902              
West Ham             1.045                -0.961              
Wolves               0.881                -1.091              
------------------------------------------------------------
Home Advantage: 0.222
Intercept: 0.141
Rho: -0.04
Gamma: 0.373

Making Predictions

To make a prediction using any of the above models, just pass the name of the home and away teams to the predict function. This returns the FootballProbabilityGrid class that can convert the output from the model into probabilities for various betting markets.

probs = my_model.predict("Liverpool", "Stoke")

Home / Draw / Away

# also known as 1x2
probs.home_draw_away

[0.5193995875820345, 0.3170596913687951, 0.1635407210315597]

Total Goals

probs.total_goals("over", 2.5)

0.31911650768322447

probs.total_goals("under", 2.5)

0.680883492299145

Asian Handicaps

probs.asian_handicap("home", 1.5)

0.2602616248461783

probs.asian_handicap("away", -1.5)

0.7397383751361912

Model Parameters

You can access the model's parameters via the get_params function.

from pprint import pprint
params = my_model.get_params()
pprint(params)

{'attack_Arsenal': 1.3650671020694474,
 'attack_Aston Villa': 0.6807140182913024,
 'attack_Blackburn': 0.971135574781119,
 'attack_Bolton': 0.9502712140456423,
 'attack_Chelsea': 1.235466344414206,
 'attack_Everton': 0.9257685468926837,
 'attack_Fulham': 0.9122902202053228,
 'attack_Liverpool': 0.8684673939949753,
 'attack_Man City': 1.543379586931267,
 'attack_Man United': 1.4968564161865994,
 'attack_Newcastle': 1.1095636706231062,
 'attack_Norwich': 1.0424304866584615,
 'attack_QPR': 0.827439335780754,
 'attack_Stoke': 0.6248927873330669,
 'attack_Sunderland': 0.8510292333101492,
 'attack_Swansea': 0.8471368133406263,
 'attack_Tottenham': 1.2496040004504756,
 'attack_West Brom': 0.8625207332372105,
 'attack_Wigan': 0.8177807129177644,
 'attack_Wolves': 0.8181858085358248,
 'defence_Arsenal': -1.2192247076852236,
 'defence_Aston Villa': -1.0566859588325535,
 'defence_Blackburn': -0.7430288162188969,
 'defence_Bolton': -0.7268011436918458,
 'defence_Chelsea': -1.2065700516830344,
 'defence_Everton': -1.3564763976122773,
 'defence_Fulham': -1.1159544166204092,
 'defence_Liverpool': -1.3293118049518535,
 'defence_Man City': -1.6549894606952225,
 'defence_Man United': -1.5728126940204685,
 'defence_Newcastle': -1.1186158411320268,
 'defence_Norwich': -0.8865413401238464,
 'defence_QPR': -0.9124617361500764,
 'defence_Stoke': -1.0766419199030601,
 'defence_Sunderland': -1.2049421203955355,
 'defence_Swansea': -1.1077243368907703,
 'defence_Tottenham': -1.3160823704397775,
 'defence_West Brom': -1.1014569193066301,
 'defence_Wigan': -0.932997180492951,
 'defence_Wolves': -0.6618461794219439,
 'home_advantage': 0.2655860528422758,
 'intercept': 0.23467961435272489,
 'rho': -0.1375912978446625,
 'rue_salvesen': 0.1401430558820631}

Implied Probabilities

Removes the overround and gets the implied probabilities from odds via a variety of methods

Multiplicative

Normalizes the probabilites so they sum to 1.0 by dividing the inverse of the odds by the sum of the inverse of the odds

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.multiplicative(odds)

{'implied_probabilities': array([0.35873804, 0.42112726, 0.2201347 ]),
 'margin': 0.03242570633874986,
 'method': 'multiplicative'}

Additive

Normalizes the probabilites so they sum to 1.0 by removing an equal amount from each

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.additive(odds)

{'implied_probabilities': array([0.3595618 , 0.42397404, 0.21646416]),
 'margin': 0.03242570633874986,
 'method': 'additive'}

Power

Solves for the power coefficient that normalizes the inverse of the odds to sum to 1.0

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.power(odds)

{'implied_probabilities': array([0.3591711 , 0.42373075, 0.21709815]),
 'margin': 0.03242570633874986,
 'method': 'power',
 'k': 1.0309132393169356}

Shin

Uses the Shin (1992, 1993) method to calculate the implied probabilities

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.shin(odds)

{'implied_probabilities': array([0.35934392, 0.42324385, 0.21741223]),
 'margin': 0.03242570633874986,
 'method': 'shin',
 'z': 0.016236442857291165}

Differential Margin Weighting

Uses the differential margin approach described by Joesph Buchdahl in his wisdom of the crowds article

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.differential_margin_weighting(odds)

{'implied_probabilities': array([0.3595618 , 0.42397404, 0.21646416]),
 'margin': 0.03242570633874986,
 'method': 'differential_margin_weighting'}

Odds Ratio

Uses Keith Cheung's odds ratio approach, as discussed by Joesph Buchdahl's in his wisdom of the crowds article, to calculate the implied probabilities

import penaltyblog as pb

odds = [2.7, 2.3, 4.4]
pb.implied.odds_ratio(odds)

{'implied_probabilities': array([0.35881036, 0.42256142, 0.21862822]),
 'margin': 0.03242570633874986,
 'method': 'odds_ratio',
 'c': 1.05116912729218}

Rank Probability Scores

Based on Constantinou and Fenton (2021), penaltyblog contains a function for calculating Rank Probability Scores for assessing home, draw, away probability forecasts.

predictions is a list of home, draw, away probabilities and observed is the zero-based index for which outcome actually occurred.

import penaltyblog as pb

predictions = [
    [1, 0, 0],
    [0.9, 0.1, 0],
    [0.8, 0.1, 0.1],
    [0.5, 0.25, 0.25],
    [0.35, 0.3, 0.35],
    [0.6, 0.3, 0.1],
    [0.6, 0.25, 0.15],
    [0.6, 0.15, 0.25],
    [0.57, 0.33, 0.1],
    [0.6, 0.2, 0.2],
]

observed = [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]

for p, o in zip(predictions, observed):
    rps = pb.rps(p, o)
    print(round(rps, 4))

Download ELO rating from clubelo.com

Download ELO ratings for a given date

import penaltyblog as pb
df = pb.clubelo.fetch_rankings_by_date(2010, 1, 1)
df.head()

	Rank	Club	Country	Level	Elo	From	To
0	1	Barcelona	ESP	1	1987.68	2009-12-18 00:00:00	2010-01-02 00:00:00
1	2	Chelsea	ENG	1	1945.54	2009-12-29 00:00:00	2010-01-16 00:00:00
2	3	Man United	ENG	1	1928.53	2009-12-31 00:00:00	2010-01-09 00:00:00
3	4	Real Madrid	ESP	1	1902.72	2009-12-20 00:00:00	2010-01-03 00:00:00
4	5	Inter	ITA	1	1884.49	2009-12-21 00:00:00	2010-01-06 00:00:00

List all teams with ratings available

import penaltyblog as pb
teams = pb.clubelo.list_all_teams()
teams[:5]

['Man City', 'Bayern', 'Liverpool', 'Real Madrid', 'Man United']

Download Historical ELO ratings for a given team

import penaltyblog as pb
df = pb.clubelo.fetch_rankings_by_team("barcelona")
df.head()

	Rank	Club	Country	Level	Elo	From	To
0	None	Barcelona	ESP	1	1636.7	1939-10-22 00:00:00	1939-12-03 00:00:00
1	None	Barcelona	ESP	1	1626.1	1939-12-04 00:00:00	1939-12-10 00:00:00
2	None	Barcelona	ESP	1	1636.73	1939-12-11 00:00:00	1939-12-17 00:00:00
3	None	Barcelona	ESP	1	1646.95	1939-12-18 00:00:00	1939-12-24 00:00:00
4	None	Barcelona	ESP	1	1637.42	1939-12-25 00:00:00	1939-12-31 00:00:00

References

Mark J. Dixon and Stuart G. Coles (1997) Modelling Association Football Scores and Inefficiencies in the Football Betting Market.
Håvard Rue and Øyvind Salvesen (1999) Prediction and Retrospective Analysis of Soccer Matches in a League.
Anthony C. Constantinou and Norman E. Fenton (2012) Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models
Hyun Song Shin (1992) Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias
Hyun Song Shin (1993) Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims
Joseph Buchdahl (2015) The Wisdom of the Crowd

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.1

Sep 30, 2023

0.8.0

Aug 31, 2023

0.7.0

Mar 13, 2023

0.6.1

Jan 6, 2023

0.6.0

Dec 2, 2022

0.5.1

Nov 3, 2022

0.5.0

Oct 11, 2022

0.4.0

Aug 8, 2022

0.3.1

Jul 25, 2022

0.3.0

Apr 4, 2022

0.2.1

Jun 25, 2021

0.2.0

Jun 22, 2021

0.1.3

Jun 9, 2021

0.1.2

Jun 9, 2021

This version

0.1.1

Jun 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

penaltyblog-0.1.1.tar.gz (19.6 kB view hashes)

Uploaded Jun 9, 2021 Source

Built Distribution

penaltyblog-0.1.1-py3-none-any.whl (21.2 kB view hashes)

Uploaded Jun 9, 2021 Python 3

Hashes for penaltyblog-0.1.1.tar.gz

Hashes for penaltyblog-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`345de0427831be50bfab527b1079d10dd45969e868e25c15f788b8cc38b56d95`
MD5	`faf022ef9b104513ed985068c2e2f1f5`
BLAKE2b-256	`04db24850968405233f30b67fd0a8a6a32b50f51175765f61144fb53b3d46e18`

Hashes for penaltyblog-0.1.1-py3-none-any.whl

Hashes for penaltyblog-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`110b08ee7ca2ebc9e867c21aac889d14fbd4e91b77b86733628b1810d01d3aa8`
MD5	`04d13f5bd2ee6855b7ae2570176dba83`
BLAKE2b-256	`568afba4a512a56e5591e0a1fa2edcb171af828bc375c5e7bd941512acb7728c`

penaltyblog 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Penalty Blog

Installation

Example

Download Data from football-data.co.uk

List the countries available

Fetch the data

Predicting Goals

The Basic Poisson Model

The Dixon and Coles Adjustment

The Rue and Salvesen Adjustment

Making Predictions

Home / Draw / Away

Total Goals

Asian Handicaps

Model Parameters

Implied Probabilities

Multiplicative

Additive

Power

Shin

Differential Margin Weighting

Odds Ratio

Rank Probability Scores

Download ELO rating from clubelo.com

Download ELO ratings for a given date

List all teams with ratings available

Download Historical ELO ratings for a given team

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution