NFL data pipeline combining PFF grades and PFR game data for over/under analysis

These details have not been verified by PyPI

Project links

Project description

nfl-data-pipeline

A pip-installable data pipeline that scrapes NFL team grades from PFF (Pro Football Focus) and game/betting data from Pro Football Reference, merges the datasets, and runs postprocessing (rolling averages, rankings) to produce a dataset for over/under analysis.

Quick Start

pip install nfl-data-pipeline

Or install from source with Poetry:

git clone https://github.com/thadhutcheson/nfl-data-pipeline.git
cd nfl-data-pipeline
poetry install

Features

PFF Scraping -- Selenium-based scraper for PFF team grades (requires PFF Premium)
PFR Scraping -- Proxy-rotated scraper for Pro Football Reference boxscores
Date & Team Normalization -- Standardizes dates and team names across sources
Dataset Merging -- Inner join on date + team columns
Rolling Averages -- Pre-game cumulative stat averages per team per season
Games Played Tracking -- Cumulative games played before each matchup
Feature Rankings -- Per-date rankings across all teams
CLI Interface -- nfl-pipeline command with scrape, process, and pipeline subcommands
Python API -- Import and call any step programmatically

Prerequisites

Python 3.12+
Google Chrome + ChromeDriver
A PFF Premium subscription (for PFF scraping)
Rotating proxies in CSV format (for PFR scraping)

Setup

# Install dependencies
poetry install

# Copy and fill in credentials
cp .env.example .env

# Add your proxies
mkdir -p proxies
# Place your proxies.csv in proxies/ (format: address:port:user:password per line)

Configuration

Override defaults with environment variables:

Variable	Default	Description
`NFL_SEASONS`	`2024`	Comma-separated list of seasons for PFF scraping
`NFL_START_YEAR`	`2024`	Start year for PFR URL scraping
`NFL_END_YEAR`	`2024`	End year for PFR URL scraping
`NFL_MAX_WEEK`	`18`	Last week to scrape in the final year
`NFL_DATA_DIR`	`data`	Base directory for all data output
`NFL_PROXY_FILE`	`proxies/proxies.csv`	Path to proxy CSV file
`PFF_EMAIL`	-	PFF account email
`PFF_PASSWORD`	-	PFF account password

CLI Usage

# Run the full pipeline end-to-end
nfl-pipeline pipeline

# Scrape only PFF data (scrape + parse dates + normalize names)
nfl-pipeline scrape pff

# Scrape only PFR data (URLs + game data + parse dates + normalize names)
nfl-pipeline scrape pfr

# Run all post-processing steps
nfl-pipeline process all

# Run individual processing steps
nfl-pipeline process merge
nfl-pipeline process over-under
nfl-pipeline process averages
nfl-pipeline process games-played
nfl-pipeline process rankings

# Show version
nfl-pipeline --version

Python API

import nfl_data_pipeline

# Run individual steps
nfl_data_pipeline.scrape_pff_data()
nfl_data_pipeline.collect_boxscore_urls()
nfl_data_pipeline.scrape_all_game_info()
nfl_data_pipeline.merge_datasets()
nfl_data_pipeline.process_over_under()
nfl_data_pipeline.compute_rolling_averages()
nfl_data_pipeline.add_games_played()
nfl_data_pipeline.compute_rankings()

# Or run full pipelines
from nfl_data_pipeline.pipeline import run_full_pipeline, run_pff_pipeline, run_pfr_pipeline
run_full_pipeline()

Pipeline

PFF Scrape          PFR Scrape
    |                   |
    v                   v
Extract Dates      Normalize Dates
    |                   |
    v                   v
Normalize Names    Normalize Names
    |                   |
    +-------+   +-------+
            |   |
            v   v
           Merge
             |
             v
        Over/Under
             |
             v
      Rolling Averages
             |
             v
       Games Played
             |
             v
         Rankings

Project Structure

nfl-data-pipeline/
├── src/
│   └── nfl_data_pipeline/
│       ├── __init__.py              # __version__, top-level re-exports
│       ├── _config.py               # Paths, env vars, logging setup
│       ├── teams.py                 # Team name/abbreviation mappings
│       ├── cli.py                   # Click CLI entry point
│       ├── pipeline.py              # Full pipeline orchestrator
│       ├── scrapers/
│       │   ├── pff.py               # PFF grades scraper
│       │   ├── pfr.py               # PFR game data scraper
│       │   ├── pfr_urls.py          # PFR boxscore URL collector
│       │   ├── auth.py              # PFF authentication
│       │   └── proxies.py           # Shared proxy loading
│       ├── parsers/
│       │   ├── pff_dates.py         # PFF date extraction
│       │   ├── pff_teams.py         # PFF team name normalization
│       │   ├── pfr_dates.py         # PFR date normalization
│       │   └── pfr_teams.py         # PFR team name extraction
│       └── processing/
│           ├── merge.py             # Merge PFF + PFR datasets
│           ├── over_under.py        # O/U betting line extraction
│           ├── rolling_averages.py  # Rolling stat averages
│           ├── games_played.py      # Cumulative games played
│           └── rankings.py          # Feature rankings
├── tests/
├── pyproject.toml
├── Makefile
├── LICENSE
└── README.md

Make Commands

make all            # Run the full pipeline end-to-end
make pff            # Run only the PFF scraping + processing chain
make pfr            # Run only the PFR scraping + processing chain
make merge          # Merge PFF and PFR data (runs both chains first)
make rankings       # Run full postprocessing through rankings
make test           # Run the test suite
make clean          # Remove all generated data files
make dirs           # Create data directory structure

Notes

PFF scraping is fragile. It relies on XPath selectors tied to PFF's DOM structure. If PFF changes their frontend, the selectors in scrapers/pff.py will need updating.
PFR scraping requires proxies. Pro Football Reference rate-limits aggressively. Without rotating proxies, requests will be blocked.
The PFF scraper uses a real browser. It opens Chrome via Selenium, logs in with your credentials, and navigates page by page. This is slow but necessary since PFF renders data client-side.
Data files are not tracked in git. Run the pipeline to generate them, or bring your own data in the expected format.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.1

Feb 8, 2026

1.1.0

Feb 8, 2026

This version

1.0.1

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nfl_data_pipeline-1.0.1.tar.gz (20.1 kB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nfl_data_pipeline-1.0.1-py3-none-any.whl (27.5 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file nfl_data_pipeline-1.0.1.tar.gz.

File metadata

Download URL: nfl_data_pipeline-1.0.1.tar.gz
Upload date: Feb 8, 2026
Size: 20.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nfl_data_pipeline-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`92b93e00c108547d68bd8aa0bcaa6ce1ea87ea08e694d6ebc985f4c5e0ba0520`
MD5	`726b0b4fe97da875ab7cd5ad56f2adcd`
BLAKE2b-256	`6b3c8545d7c74900259fd5329bba9e37953efe36b6e96dfef20289035c975ce5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nfl_data_pipeline-1.0.1.tar.gz:

Publisher: publish.yml on thadhutch/nfl-data-pipeline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nfl_data_pipeline-1.0.1.tar.gz
- Subject digest: 92b93e00c108547d68bd8aa0bcaa6ce1ea87ea08e694d6ebc985f4c5e0ba0520
- Sigstore transparency entry: 927306975
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: thadhutch/nfl-data-pipeline@1e87423f33c33bbb9f26486131ef62518838b67b
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/thadhutch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e87423f33c33bbb9f26486131ef62518838b67b
- Trigger Event: release

File details

Details for the file nfl_data_pipeline-1.0.1-py3-none-any.whl.

File metadata

Download URL: nfl_data_pipeline-1.0.1-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 27.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nfl_data_pipeline-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3901ec4b662a711dd412b999652dc8fcac39ae6db3b5e627fb4829c25b50d4b8`
MD5	`36fbb9edcd49007bbae720b401dc4b79`
BLAKE2b-256	`527eb51b34219832d5935994e9be5d23375a1ddd76094e4e37a302768997f851`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nfl_data_pipeline-1.0.1-py3-none-any.whl:

Publisher: publish.yml on thadhutch/nfl-data-pipeline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nfl_data_pipeline-1.0.1-py3-none-any.whl
- Subject digest: 3901ec4b662a711dd412b999652dc8fcac39ae6db3b5e627fb4829c25b50d4b8
- Sigstore transparency entry: 927306979
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: thadhutch/nfl-data-pipeline@1e87423f33c33bbb9f26486131ef62518838b67b
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/thadhutch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e87423f33c33bbb9f26486131ef62518838b67b
- Trigger Event: release

nfl-data-pipeline 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nfl-data-pipeline

Quick Start

Features

Prerequisites

Setup

Configuration

CLI Usage

Python API

Pipeline

Project Structure

Make Commands

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance