Probabilistic soccer match prediction — Dixon-Coles, Bradley-Terry, Elo, and distribution-based models with a scikit-learn compatible API.
Project description
pespila
Probabilistic soccer match prediction with a scikit-learn compatible API.
Four fitted models — Dixon-Coles, Bradley-Terry, Elo, and a dual-perspective distribution model — each expose fit(), predict(), and predict_proba() to produce calibrated Home / Draw / Away probabilities from historical match data. A library of goal-scoring distributions (Poisson, ZIP, Negative Binomial, Geometric, Zero-Inflated Weibull, Double Poisson) powers the underlying frequency analysis.
Installation
pip install pespila
Quick Start
import numpy as np
from pespila.models.dixon_coles import DixonColesModel
# X: (n_matches, 3) = [home_id, away_id, days_ago]
# y: (n_matches, 2) = [home_goals, away_goals]
model = DixonColesModel(xi=0.005)
model.fit(X_train, y_train)
probs = model.predict_proba(X_test) # → (n, 3): [P(Home), P(Draw), P(Away)]
preds = model.predict(X_test) # → (n,): 0=Home, 1=Draw, 2=Away
accuracy = model.score(X_test, y_test)
Models
| Model | Class | Input X | Input y | Method |
|---|---|---|---|---|
| Dixon-Coles | DixonColesModel |
(n, 3) home_id, away_id, days_ago |
(n, 2) home_goals, away_goals |
Bivariate Poisson with low-scoring correction (τ) and time-decay weighting |
| Bradley-Terry | BradleyTerryModel |
(n, 2) home_id, away_id |
(n,) result codes |
MM algorithm with Davidson draw extension |
| Elo | EloModel |
(n, 2) home_id, away_id |
(n, 2) home_goals, away_goals |
Incremental ratings with goal-difference scaling |
| SvS/CvC | SvSCvCPredictor |
— | — | Dual-perspective 6×6 probability matrix from fitted goal distributions |
All models follow the scikit-learn estimator contract (fit / predict / predict_proba / score). Result encoding: 0 = Home win, 1 = Draw, 2 = Away win.
Dixon-Coles
from pespila.models.dixon_coles import DixonColesModel
model = DixonColesModel(xi=0.005, max_goals=6)
model.fit(X, y)
model.predict_proba(X_new) # → (n, 3)
model.predict_match(home_id, away_id) # single prediction with full scoreline matrix
Estimates per-team attack/defense strengths, a home advantage factor, and a dependence parameter ρ. Recent matches are up-weighted via exponential decay controlled by xi.
Bradley-Terry
from pespila.models.bradley_terry import BradleyTerryModel
model = BradleyTerryModel(max_iter=200, home_advantage=True)
model.fit(X, y)
model.predict_proba(X_new) # → (n, 3)
Pairwise comparison model. Draws are handled via the Davidson extension parameter ν, fitted jointly with team strengths through Minorization-Maximization.
Elo
from pespila.models.elo import EloModel
model = EloModel(k_factor=20.0, home_advantage=100.0)
model.fit(X, y)
model.predict_proba(X_new) # → (n, 3)
Processes matches sequentially, updating ratings after each result. The K-factor is scaled by log(1 + goal_difference) to reward dominant wins.
SvS/CvC (Scored-vs-Scored / Conceded-vs-Conceded)
from pespila.models.svs_cvc import SvSCvCPredictor
predictor = SvSCvCPredictor()
result = predictor.predict_match("Arsenal", "Chelsea", "2024-25", "E0")
# → {'svs': {...}, 'cvc': {...}, 'combined': {...}}
A dual-perspective model ported from the original R implementation. Fits goal-scoring distributions to each team's historical frequencies, then builds a 6×6 scoreline probability matrix from two independent views (scored-vs-scored and conceded-vs-conceded).
Goal-Scoring Distributions
Seven distributions for modelling discrete goal frequencies, all sharing a common interface:
from pespila.distributions import PoissonDist
dist = PoissonDist()
dist.fit(frequencies) # frequencies = [f0, f1, f2, f3, f4, f5+]
dist.pmf(np.arange(6)) # probability mass function
dist.goodness_of_fit(observed) # chi-squared p-value
| Distribution | Class | Parameters |
|---|---|---|
| Poisson | PoissonDist |
λ |
| Zero-Inflated Poisson | ZeroInflatedPoissonDist |
λ, φ |
| Negative Binomial | NegBinomDist |
k, p |
| Geometric | GeometricDist |
p |
| Uniform | UniformDist |
a, b |
| Zero-Inflated Weibull | ZeroInflatedWeibullDist |
shape, scale, φ |
| Double Poisson | DoublePoissonDist |
μ, φ |
The DistributionSelector fits all distributions to a frequency vector and ranks them by goodness-of-fit.
Ensemble & Reinforcement Learning (Experimental)
- ACWS Ensemble (
pespila.ensemble.adaptive.AdaptiveStackedPredictor): A LightGBM meta-learner that dynamically reweights base model predictions using match context features. - RL-DQN Agent (
pespila.rl.agent): A pure-NumPy Deep Q-Network for learning betting strategies from state vectors.
Unified Prediction Interface
For end-to-end use with the included data pipeline:
from pespila.predict import MatchPredictor
mp = MatchPredictor()
prediction = mp.predict("Dixon-Coles", "Arsenal", "Chelsea", league_id=1, season_id=32)
prediction.home_win # 45.2 (percentage)
prediction.draw # 27.1
prediction.away_win # 27.7
prediction.result # "H"
prediction.matrix # 6×6 scoreline probability matrix
Data Pipeline
An optional data pipeline downloads historical match data from football-data.co.uk into a normalized SQLite database. This is required only for the MatchPredictor interface and the Streamlit app — the individual model classes work with any NumPy array input.
from pespila.data.pipeline import DataPipeline
pipeline = DataPipeline()
pipeline.full_refresh() # download and ingest all leagues/seasons
pipeline.compute_all_matchdays() # assign matchday numbers
pipeline.compute_standings() # compute league tables
Coverage: 22 leagues across 11 countries.
Streamlit App
A companion dashboard (not included in the PyPI package) for interactive exploration:
pip install pespila[app]
streamlit run app/app.py
Features: historical match results with prediction comparison, betting P/L simulation, league tables with matchday slider, and on-demand predictions with probability heatmaps.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pespila-0.1.1.tar.gz.
File metadata
- Download URL: pespila-0.1.1.tar.gz
- Upload date:
- Size: 33.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ece886180da9ce4dd7ba2254ef499bb5e9025c49f88b4ccdbe38474f438c8985
|
|
| MD5 |
36cd5efbc490c49c2abf9b6a120efa08
|
|
| BLAKE2b-256 |
9f94da162cf528469a19545da0b30e8d1ff7993d3a3980081154de5a9d06ebef
|
File details
Details for the file pespila-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pespila-0.1.1-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd7880d5b763a4f2b6b38cdf6a93844fdb977f593b931a218b10625759a34f6a
|
|
| MD5 |
6facca520730597394a316fdc24236ad
|
|
| BLAKE2b-256 |
94d384ef5112107a6885cd3d7bdbe74c822bd862fb6867774380d407a0065aea
|