Portable machine learning models for NFL analytics
Project description
nfeloml
Portable machine learning models for NFL analytics
nfeloml provides distributed python ports of nflfastr models so they can be used in web services or other python based applications.
Features
- Expected Points (EP) - Predict the next scoring outcome and expected points for any play
- Win Probability (WP) - Predict the probability of the possession team winning the game
- Type-Safe - Full type hints and dataclass-based inputs/outputs
- Portable - Pre-trained models bundled with package, auto-loaded on import
- DataFrame-Native - Bulk predictions on entire DataFrames
Installation
pip install nfeloml
For training models (not required for inference):
pip install nfeloml[training]
This installs nfelodcm which is only needed for training, not for using the pre-trained models.
Quick Start
Expected Points - DataFrame Inference
The most common use case is enriching a DataFrame of plays with predictions:
from nfeloml import ExpectedPointsModel
import nfelodcm as dcm
## Load the model (automatically loads from package)
model = ExpectedPointsModel()
## Load some plays - as an example, this loads nflfastr, which already has EPA values
## though it can be used with any df that has compatible columns
db = dcm.load(['pbp'])
plays = db['pbp'].copy()
## Add EP predictions to the entire DataFrame (optionally with EPA)
enriched = model.predict_df(plays, include_epa=True)
## Now you have: expected_points and epa (EPA auto-calculated)
print(enriched[['desc', 'expected_points', 'epa']].head())
Expected Points - Single Play
For type-safe single predictions:
from nfeloml import ExpectedPointsModel, EPFeatures
model = ExpectedPointsModel()
features = EPFeatures(
half_seconds_remaining=1800,
yardline_100=75,
home=1,
retractable=0,
dome=0,
outdoors=1,
down=1,
ydstogo=10,
era=4,
posteam_timeouts_remaining=3,
defteam_timeouts_remaining=3
)
## Simple usage - returns float
ep = model.predict(features)
print(f"Expected Points: {ep:.2f}")
## Full prediction with probabilities - returns EPPrediction object
prediction = model.predict(features, include_probabilities=True)
print(f"Expected Points: {prediction.expected_points():.2f}")
print(f"TD Probability: {prediction.touchdown:.1%}")
Win Probability - DataFrame Inference
from nfeloml import WinProbabilityModel
import nfelodcm as dcm
## Load model
model = WinProbabilityModel()
## Load plays
db = dcm.load(['pbp'])
plays = db['pbp'].copy()
## Add WP predictions
enriched = model.predict_df(plays)
## Now you have: win_probability, again note that these already exist in nflfastr
print(enriched[['desc', 'win_probability']].head())
Win Probability - Single Play
from nfeloml import WinProbabilityModel, WPFeatures
import numpy as np
model = WinProbabilityModel()
features = WPFeatures(
receive_2h_ko=1,
home=1,
half_seconds_remaining=300,
game_seconds_remaining=2100,
diff_time_ratio=7 * np.exp(4 * (3600 - 2100) / 3600),
score_differential=7,
down=2,
ydstogo=5,
yardline_100=45,
posteam_timeouts_remaining=2,
defteam_timeouts_remaining=3
)
prediction = model.predict(features)
print(f"Win Probability: {prediction.win_probability:.1%}")
Calculating EPA (Expected Points Added)
EPA can be calculated automatically when generating EP predictions:
from nfeloml import ExpectedPointsModel
import nfelodcm as dcm
model = ExpectedPointsModel()
## Load data
db = dcm.load(['pbp'])
plays = db['pbp'].copy()
## Add EP and EPA in one call
plays = model.predict_df(plays, include_epa=True)
## Now you have both EP and EPA!
print(plays[['desc', 'expected_points', 'epa']].head())
Alternatively, you can calculate EPA separately on data that already has EP:
from nfeloml import calculate_epa
## If you already have expected_points in your dataframe
plays = calculate_epa(plays)
How EPA is Calculated
EPA measures the change in expected points from the start to the end of a play:
- Regular plays: EPA = EP_end - EP_start
- Scoring plays:
- Touchdown: EP_end = 7
- Field Goal: EP_end = 3
- Safety: EP_end = -2 (for offense)
- Possession changes: When the next play has a different
posteam, EP is negated (opponent's perspective) - Invalid plays: Plays without required features result in null EPA
The function automatically:
- Skips to the next valid play with EP (ignoring timeouts, announcements, etc.)
- Handles scoring plays
- Accounts for possession changes
Model Training
The package provides a complete training pipeline. Models are trained using data from nfelodcm.
Note: Training requires the optional nfelodcm dependency:
pip install nfeloml[training]
Training Expected Points Model
from nfeloml.models.expected_points import EPTrainer, EPDataLoader, EPTrainingConfig
from nfeloml.core.types import ModelMetadata
from pathlib import Path
from datetime import datetime
## Configure training
config = EPTrainingConfig(
seasons=list(range(2000, 2024)),
validation_strategy="loso",
random_seed=2013
)
## Initialize data loader and trainer
data_loader = EPDataLoader()
trainer = EPTrainer(config, data_loader)
## Train the model
model = trainer.train()
## Evaluate
metrics = trainer.evaluate()
print(f"Calibration Error: {metrics['calibration_error']['overall']:.4f}")
## Save the trained model (saves to package directory automatically)
metadata = ModelMetadata(
model_name="ExpectedPoints",
version="1.0.0",
trained_date=datetime.now(),
training_seasons=config.seasons,
calibration_error=metrics['calibration_error']['overall']
)
package_dir = Path(__file__).parent / 'trained_models'
trainer.save_model(package_dir / 'ep_model.ubj', metadata)
Training Win Probability Model
from nfeloml.models.win_probability import WPTrainer, WPDataLoader, WPTrainingConfig
## Configure training
config = WPTrainingConfig(
seasons=list(range(2000, 2024)),
use_spread=False ## Set to True for spread-adjusted model
)
## Train
data_loader = WPDataLoader()
trainer = WPTrainer(config, data_loader)
model = trainer.train()
## Evaluate
metrics = trainer.evaluate()
print(f"Calibration Error: {metrics['overall']:.4f}")
Layout
nfeloml/
├── core/ # Shared abstractions
│ ├── base_model.py # Model abstraction
│ ├── base_trainer.py # Training abstraction
│ ├── base_data_loader.py # Data loading abstraction
│ └── types.py # Common types
├── models/
│ ├── expected_points/ # EP model implementation
│ └── win_probability/ # WP model implementation
└── utils/ # Utilities
└── validation.py # Data validation
Each model consists of:
- types.py - Dataclass definitions for inputs/outputs
- data_loader.py - Data fetching via nfelodcm
- trainer.py - Model training with cross-validation
- model.py - Inference interface
Data Source
Training data comes from nfelodcm, which provides:
- Automatic caching and freshness checks
- Simple interface:
dcm.load(['pbp'])
Models
Expected Points (EP)
Predicts the expected points value of the next scoring play. Uses a 7-class XGBoost model to predict probabilities for:
- Touchdown (7 points)
- Opponent Touchdown (-7 points)
- Field Goal (3 points)
- Opponent Field Goal (-3 points)
- Safety (2 points)
- Opponent Safety (-2 points)
- No Score (0 points)
Features:
- Game situation (down, distance, yardline)
- Time remaining in half
- Timeouts remaining
- Home field advantage
- Stadium type (dome, outdoors, retractable)
- Era adjustments for rule changes
Win Probability (WP)
Predicts the probability that the possession team will win the game. Uses binary XGBoost classification.
Features:
- Score differential
- Time remaining (game and half)
- Game situation (down, distance, yardline)
- Timeouts remaining
- Which team receives 2nd half kickoff
- Home field advantage
- Optional: Vegas point spread (spread-adjusted model)
Development
Adding New Models
To add a new model (e.g., Completion Probability):
- Create directory:
src/nfeloml/models/completion_probability/ - Implement
types.pywith input/output dataclasses - Implement
data_loader.pyextendingBaseDataLoader - Implement
trainer.pyextendingBaseTrainer - Implement
model.pyextendingBaseModelwithget_model_filename()andpredict_df() - Export in
__init__.pyfiles
The abstract base classes handle common functionality like LOSO cross-validation, model persistence, and evaluation.
File Formats
Models are stored in XGBoost's native binary format (.ubj) for optimal size and performance. Metadata is stored in JSON alongside the model.
Credits
The models are intended to be ports of the existing nflfastr models developed by Ben Baldwin (@benbbaldwin) and thus should be credited to him and the nflfastr team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nfeloml-0.1.1.tar.gz.
File metadata
- Download URL: nfeloml-0.1.1.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82b33d9718540d5920ce0b9ad21eb4f6a6c9748e4dcdb909ac406b70abff41fe
|
|
| MD5 |
966e5cacc88d7fe9301371faf868485d
|
|
| BLAKE2b-256 |
af38d05ded60be2cd0c97a0836d7e0f337c7b0eeb12e5fba20f496c299020a85
|
File details
Details for the file nfeloml-0.1.1-py3-none-any.whl.
File metadata
- Download URL: nfeloml-0.1.1-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb8d35810856cf585af2a18e411d071f27151461e5b79dc56c28d39e847f7fa6
|
|
| MD5 |
5968bd5dfe499bb08f0f15abcae2a32f
|
|
| BLAKE2b-256 |
8b8038b610cd0a21c052447ea88212120088c5e31da67a848773b10fa1e6b8b1
|