Skip to main content

Portable machine learning models for NFL analytics

Project description

nfeloml

Portable machine learning models for NFL analytics

nfeloml provides distributed python ports of nflfastr models so they can be used in web services or other python based applications.

Features

  • Expected Points (EP) - Predict the next scoring outcome and expected points for any play
  • Win Probability (WP) - Predict the probability of the possession team winning the game
  • Type-Safe - Full type hints and dataclass-based inputs/outputs
  • Portable - Pre-trained models bundled with package, auto-loaded on import
  • DataFrame-Native - Bulk predictions on entire DataFrames

Installation

pip install nfeloml

Quick Start

Expected Points - DataFrame Inference

The most common use case is enriching a DataFrame of plays with predictions:

from nfeloml import ExpectedPointsModel
import nfelodcm as dcm

##  Load the model (automatically loads from package)
model = ExpectedPointsModel()

##  Load some plays - as an example, this loads nflfastr, which already has EPA values
##  though it can be used with any df that has compatible columns
db = dcm.load(['pbp'])
plays = db['pbp'].copy()

##  Add EP predictions to the entire DataFrame (optionally with EPA)
enriched = model.predict_df(plays, include_epa=True)

##  Now you have: expected_points and epa (EPA auto-calculated)
print(enriched[['desc', 'expected_points', 'epa']].head())

Expected Points - Single Play

For type-safe single predictions:

from nfeloml import ExpectedPointsModel, EPFeatures

model = ExpectedPointsModel()

features = EPFeatures(
    half_seconds_remaining=1800,
    yardline_100=75,
    home=1,
    retractable=0,
    dome=0,
    outdoors=1,
    down=1,
    ydstogo=10,
    era=4,
    posteam_timeouts_remaining=3,
    defteam_timeouts_remaining=3
)

##  Simple usage - returns float
ep = model.predict(features)
print(f"Expected Points: {ep:.2f}")

##  Full prediction with probabilities - returns EPPrediction object
prediction = model.predict(features, include_probabilities=True)
print(f"Expected Points: {prediction.expected_points():.2f}")
print(f"TD Probability: {prediction.touchdown:.1%}")

Win Probability - DataFrame Inference

from nfeloml import WinProbabilityModel
import nfelodcm as dcm

##  Load model
model = WinProbabilityModel()

##  Load plays
db = dcm.load(['pbp'])
plays = db['pbp'].copy()

##  Add WP predictions
enriched = model.predict_df(plays)

##  Now you have: win_probability, again note that these already exist in nflfastr
print(enriched[['desc', 'win_probability']].head())

Win Probability - Single Play

from nfeloml import WinProbabilityModel, WPFeatures
import numpy as np

model = WinProbabilityModel()

features = WPFeatures(
    receive_2h_ko=1,
    home=1,
    half_seconds_remaining=300,
    game_seconds_remaining=2100,
    diff_time_ratio=7 * np.exp(4 * (3600 - 2100) / 3600),
    score_differential=7,
    down=2,
    ydstogo=5,
    yardline_100=45,
    posteam_timeouts_remaining=2,
    defteam_timeouts_remaining=3
)

prediction = model.predict(features)
print(f"Win Probability: {prediction.win_probability:.1%}")

Calculating EPA (Expected Points Added)

EPA can be calculated automatically when generating EP predictions:

from nfeloml import ExpectedPointsModel
import nfelodcm as dcm

model = ExpectedPointsModel()

##  Load data
db = dcm.load(['pbp'])
plays = db['pbp'].copy()

##  Add EP and EPA in one call
plays = model.predict_df(plays, include_epa=True)

##  Now you have both EP and EPA!
print(plays[['desc', 'expected_points', 'epa']].head())

Alternatively, you can calculate EPA separately on data that already has EP:

from nfeloml import calculate_epa

##  If you already have expected_points in your dataframe
plays = calculate_epa(plays)

How EPA is Calculated

EPA measures the change in expected points from the start to the end of a play:

  • Regular plays: EPA = EP_end - EP_start
  • Scoring plays:
    • Touchdown: EP_end = 7
    • Field Goal: EP_end = 3
    • Safety: EP_end = -2 (for offense)
  • Possession changes: When the next play has a different posteam, EP is negated (opponent's perspective)
  • Invalid plays: Plays without required features result in null EPA

The function automatically:

  • Skips to the next valid play with EP (ignoring timeouts, announcements, etc.)
  • Handles scoring plays
  • Accounts for possession changes

Model Training

The package provides a complete training pipeline. Models are trained using data from nfelodcm.

Training Expected Points Model

from nfeloml.models.expected_points import EPTrainer, EPDataLoader, EPTrainingConfig
from nfeloml.core.types import ModelMetadata
from pathlib import Path
from datetime import datetime

##  Configure training
config = EPTrainingConfig(
    seasons=list(range(2000, 2024)),
    validation_strategy="loso",
    random_seed=2013
)

##  Initialize data loader and trainer
data_loader = EPDataLoader()
trainer = EPTrainer(config, data_loader)

##  Train the model
model = trainer.train()

##  Evaluate
metrics = trainer.evaluate()
print(f"Calibration Error: {metrics['calibration_error']['overall']:.4f}")

##  Save the trained model (saves to package directory automatically)
metadata = ModelMetadata(
    model_name="ExpectedPoints",
    version="1.0.0",
    trained_date=datetime.now(),
    training_seasons=config.seasons,
    calibration_error=metrics['calibration_error']['overall']
)

package_dir = Path(__file__).parent / 'trained_models'
trainer.save_model(package_dir / 'ep_model.ubj', metadata)

Training Win Probability Model

from nfeloml.models.win_probability import WPTrainer, WPDataLoader, WPTrainingConfig

##  Configure training
config = WPTrainingConfig(
    seasons=list(range(2000, 2024)),
    use_spread=False  ##  Set to True for spread-adjusted model
)

##  Train
data_loader = WPDataLoader()
trainer = WPTrainer(config, data_loader)
model = trainer.train()

##  Evaluate
metrics = trainer.evaluate()
print(f"Calibration Error: {metrics['overall']:.4f}")

Layout

nfeloml/
├── core/                   # Shared abstractions
│   ├── base_model.py       # Model abstraction
│   ├── base_trainer.py     # Training abstraction
│   ├── base_data_loader.py # Data loading abstraction
│   └── types.py            # Common types
├── models/
│   ├── expected_points/    # EP model implementation
│   └── win_probability/    # WP model implementation
└── utils/                  # Utilities
    └── validation.py       # Data validation

Each model consists of:

  • types.py - Dataclass definitions for inputs/outputs
  • data_loader.py - Data fetching via nfelodcm
  • trainer.py - Model training with cross-validation
  • model.py - Inference interface

Data Source

Training data comes from nfelodcm, which provides:

  • Automatic caching and freshness checks
  • Simple interface: dcm.load(['pbp'])

Models

Expected Points (EP)

Predicts the expected points value of the next scoring play. Uses a 7-class XGBoost model to predict probabilities for:

  • Touchdown (7 points)
  • Opponent Touchdown (-7 points)
  • Field Goal (3 points)
  • Opponent Field Goal (-3 points)
  • Safety (2 points)
  • Opponent Safety (-2 points)
  • No Score (0 points)

Features:

  • Game situation (down, distance, yardline)
  • Time remaining in half
  • Timeouts remaining
  • Home field advantage
  • Stadium type (dome, outdoors, retractable)
  • Era adjustments for rule changes

Win Probability (WP)

Predicts the probability that the possession team will win the game. Uses binary XGBoost classification.

Features:

  • Score differential
  • Time remaining (game and half)
  • Game situation (down, distance, yardline)
  • Timeouts remaining
  • Which team receives 2nd half kickoff
  • Home field advantage
  • Optional: Vegas point spread (spread-adjusted model)

Development

Adding New Models

To add a new model (e.g., Completion Probability):

  1. Create directory: src/nfeloml/models/completion_probability/
  2. Implement types.py with input/output dataclasses
  3. Implement data_loader.py extending BaseDataLoader
  4. Implement trainer.py extending BaseTrainer
  5. Implement model.py extending BaseModel with get_model_filename() and predict_df()
  6. Export in __init__.py files

The abstract base classes handle common functionality like LOSO cross-validation, model persistence, and evaluation.

File Formats

Models are stored in XGBoost's native binary format (.ubj) for optimal size and performance. Metadata is stored in JSON alongside the model.

Credits

The models are intended to be ports of the existing nflfastr models developed by Ben Baldwin (@benbbaldwin) and thus should be credited to him and the nflfastr team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nfeloml-0.1.0.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nfeloml-0.1.0-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file nfeloml-0.1.0.tar.gz.

File metadata

  • Download URL: nfeloml-0.1.0.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for nfeloml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 420bc360dc586a6d21e4ef111efb7e36c3569fe40b45d0cae4c04afef25fddc5
MD5 03c080aa739c58a1c379fd4235588c4c
BLAKE2b-256 651aa0965edc753dc272ab442a591f902f5a1610eeb57319af19ea46d1f08bfa

See more details on using hashes here.

File details

Details for the file nfeloml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nfeloml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for nfeloml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e771c76621e183260630e54b06e8c77af90ab865a5991381e466f4468c5c3919
MD5 5dc8476c62796a79a53bc5dc2c3d493d
BLAKE2b-256 98b2daee456b10620a425261fb310c9df8fa9810886e322bf3dcba9856011e06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page