A comprehensive Python package for scraping and analyzing NHL data with built-in Expected Goals (xG) modeling

These details have not been verified by PyPI

Project description

ScraperNHL

Scrape and analyze hockey data from 6 leagues with one unified API.

ScraperNHL provides play-by-play events, player stats, schedules, rosters, and standings for the NHL, AHL, PWHL, OHL, WHL, and QMJHL — all returned as pandas DataFrames, all from the same interface.

NHL support goes further with an advanced analytics pipeline: time-on-ice matrices, shift-level analysis, on-ice shot/Corsi/Fenwick stats, and per-60 rates.

Supported Leagues

League	Key	Season format	Current season
National Hockey League	`nhl`	`YYYYYYYY`	`20252026`
American Hockey League	`ahl`	integer	`90`
Provincial Women's Hockey League	`pwhl`	integer	`8`
Ontario Hockey League	`ohl`	integer	`83`
Western Hockey League	`whl`	integer	`289`
Quebec Major Junior Hockey League	`qmjhl`	integer	`211`

Installation

pip install scrapernhl

From source (latest dev):

git clone https://github.com/maxtixador/scrapernhl.git
cd scrapernhl
pip install -e .

Requirements: Python 3.10+, pandas, numpy, requests, beautifulsoup4, selectolax

Two Ways to Use It

1. Functional API — one-liners for everything

from scrapernhl import scrape

# Play-by-play — works for all 6 leagues
pbp = scrape('nhl',   'pbp', game_id=2023020001)
pbp = scrape('ahl',   'pbp', game_id=1027781)
pbp = scrape('qmjhl', 'pbp', game_id=31909)
pbp = scrape('ohl',   'pbp', game_id=28150)
pbp = scrape('whl',   'pbp', game_id=1022126)
pbp = scrape('pwhl',  'pbp', game_id=210)

# Player stats
skaters = scrape('ahl',   'stats', season=90, position='skaters')
goalies = scrape('ohl',   'stats', season=83, position='goalies')
skaters = scrape('nhl',   'stats', team='MTL', season=20232024, position='skaters')  # NHL needs a team

# Schedule, roster, standings
schedule  = scrape('whl',  'schedule',  season=289)
schedule  = scrape('nhl',  'schedule',  team='MTL', season=20232024)  # NHL needs a team
roster    = scrape('nhl',  'roster',    team='MTL', season=20232024)
standings = scrape('qmjhl','standings', season=211)
standings = scrape('nhl',  'standings', season=20232024)

# Teams and seasons
teams   = scrape('nhl', 'teams')              # active NHL teams
teams   = scrape('ahl', 'teams', season=90)   # AHL teams for a season
seasons = scrape('ahl', 'seasons')

2. Object-Oriented API — more control

from scrapernhl import HockeyScraper

s = HockeyScraper('ahl')

pbp      = s.play_by_play(game_id=1027781)
skaters  = s.player_stats(season=90, position='skaters')
goalies  = s.player_stats(season=90, position='goalies')
schedule = s.schedule(season=90)               # team='all' by default for non-NHL
roster   = s.roster(team='390', season=90)     # team ID from bootstrap data
standing = s.standings(season=90)
teams    = s.teams_by_season(season=90)
seasons  = s.seasons('all')                    # 'all', 'regular', or 'playoff'

# Convenience aliases — same result, different names
s.scrape_pbp(game_id=1027781)
s.scrape_skaters()
s.scrape_goalies()
s.scrape_schedule()
s.scrape_roster(team='390')
s.scrape_standings()

# Scrape multiple games and get one concatenated DataFrame
df = s.scrape_multiple_games([1027781, 1027779])

League Metadata (non-NHL)

Bootstrap data is fetched automatically when you create a non-NHL scraper. Use it to look up valid team IDs and season IDs before making other calls.

s = HockeyScraper('ahl')

s.teams                          # list of team dicts
s.current_season_id              # '90'
s.get_teams(include_all=False)   # excludes the "All Teams" placeholder
s.get_team_by_id('390')          # dict with id, name, team_code, logo, ...
s.get_team_by_code('ABB')
s.get_seasons('regular')         # list of season dicts; also 'playoff', 'all'
s.get_current_season()           # dict for the current season
s.get_conferences()
s.get_divisions()
s.get_positions()
s.get_league_metadata()          # league name, short_name, code, logo
s.is_playoffs_active()           # True during playoff season
s.is_bilingual()                 # True for QMJHL (has French translations)

# Raw bootstrap dict
data = s.bootstrap(season='90', page_name='scorebar')

NHL-Specific Methods

The following are only available on HockeyScraper('nhl') and raise NotImplementedError for other leagues.

Play-by-Play Sources

nhl = HockeyScraper('nhl')

# Three different PBP sources for the same game
json_pbp = nhl.scrape_plays(2023020001)    # JSON API — fastest
html_pbp = nhl.html_pbp(2023020001)        # HTML report — includes faceoff zone, shot type
full_pbp = nhl.scrape_game(2023020001)     # Merged pipeline (HTML + JSON) — most complete

# Raw dict from the JSON API
data = nhl.get_game_data(2023020001)

# With include_tuple=True, scrape_game returns a GameResult namedtuple
# (pbp_df, shifts_df, html_pbp_df, home_team, away_team)
result = nhl.scrape_game(2023020001, include_tuple=True)
pbp, shifts, html, home, away = result

Shifts, Stats, Standings

shifts = nhl.shifts(2023020001)

nhl.team_stats(team='MTL', season=20232024, session=2, goalies=False)
# session: 1=preseason, 2=regular season, 3=playoffs

nhl.standings_by_date('2024-01-15')
nhl.standings_by_date()           # defaults to Jan 1 of the previous year

Teams and Draft

# Three team data sources
nhl.scrape_teams(source='calendar')    # active teams from the schedule calendar
nhl.scrape_teams(source='franchise')   # franchise list with first/last season
nhl.scrape_teams(source='records')     # records API — includes logos, conference, division

# Draft
nhl.draft(year=2024, round='all')      # all rounds
nhl.draft(year=2023, round=1)          # single round
nhl.draft_records(year=2024)           # records API — more player detail
nhl.team_draft_history(franchise=1)    # all picks for one franchise (1 = NJD)

NHL Analytics Pipeline

scrape_game is the starting point. It merges HTML and JSON PBP into one enriched DataFrame with on-ice player lists, strength state, zone starts, and shot coordinates.

nhl = HockeyScraper('nhl')

# Step 1: Get game data
pbp    = nhl.scrape_game(2023020001)
shifts = nhl.shifts(2023020001)

# Step 2: Player-by-second matrix and strength states
matrix    = nhl.seconds_matrix(pbp, shifts)
strengths = nhl.strengths_by_second(matrix)

# Step 4: Time-on-ice by strength
toi = nhl.toi_by_strength_all(matrix, strengths)
toi = nhl.toi_by_strength_all(matrix, strengths, in_seconds=True)

# Step 5: Pairwise shared TOI
teammates = nhl.shared_toi_teammates(matrix, strengths)
opponents = nhl.shared_toi_opponents(matrix, strengths)

# Step 5: On-ice shot/goal stats
player_stats = nhl.on_ice_stats(pbp)
player_stats = nhl.on_ice_stats(pbp, include_goalies=True, rates=True)  # per-60 rates

# Combination stats (e.g. all 2-player pairs for MTL)
combos = nhl.combo_on_ice_stats(pbp, focus_team='MTL', n_team=2, m_opp=0)

# Team-level aggregates by strength state
team_agg = nhl.team_strength_aggregates(pbp, rates=True)

# On-ice player columns: choose long (tidy) or wide (numbered) format
long_df = nhl.build_on_ice_long(pbp)
wide_df = nhl.build_on_ice_wide(pbp, max_skaters=6, include_goalie=True)

# Shift events table (ON/OFF events from the shifts DataFrame)
shift_events = nhl.build_shifts_events(shifts)

Command-Line Interface

# Play-by-play
scrapernhl ahl   game 1027781              --output game.csv
scrapernhl game  2023020001               --output nhl_game.json

# Player stats (non-NHL)
scrapernhl ahl   stats --season 90 --player-type skater  --output stats.csv
scrapernhl ohl   stats --season 83 --player-type goalie  --output goalies.json

# NHL player stats (top-level command, requires team + season)
scrapernhl stats MTL 20252026            --output mtl_skaters.csv
scrapernhl stats MTL 20252026 --goalies  --output mtl_goalies.csv

# Schedule
scrapernhl whl   schedule --season 289   --output schedule.csv
scrapernhl schedule MTL 20252026         --output nhl_schedule.csv

# Standings
scrapernhl standings                     --output standings.csv
scrapernhl qmjhl standings --season 211  --output standings.json

scrapernhl --help
scrapernhl ahl --help

Important Behavior Notes

NHL player_stats and schedule require a team tricode. The NHL API serves data per-team, not league-wide. Pass team='MTL', team='TOR', etc. Non-NHL leagues default to team='all' for league-wide data.

Bootstrap data is fetched on init for non-NHL leagues. The first call to HockeyScraper('ahl') makes one network request to get teams, seasons, and configuration. Subsequent calls use the cached data.

Caching is automatic and disk-based.

Data type	Cache TTL
Play-by-play	None (always fresh)
Schedule	1 hour
Player stats	1 hour
Standings	30 minutes
Roster	24 hours

Running Tests

# Integration tests — require a network connection
pytest tests/test_client.py -v

# Run only a specific class
pytest tests/test_client.py::TestNHLAnalytics -v
pytest tests/test_client.py::TestPlayByPlay -v

717 tests cover all 6 leagues across: instantiation, bootstrap accessors, play-by-play, player stats (skaters + goalies), schedules, rosters, standings, teams, seasons, batch scraping, all NHL-specific methods, the full analytics pipeline, and the scrape() functional API.

Project Structure

scrapernhl/
├── __init__.py         # Public API: HockeyScraper, scrape()
├── client.py           # Unified HockeyScraper class (~900 lines)
├── config.py           # League configs, API keys, cache TTLs
├── urls.py             # URL builders for every league/endpoint
├── parsers.py          # Extract records from raw API responses
├── transform.py        # Normalize coordinates, events, times
├── enrichment.py       # Add team names, season metadata (non-NHL)
├── utils.py            # Rate limiter, disk cache, HTTP session
├── cli.py              # Click-based CLI
└── nhl/
    ├── scraper_legacy.py   # Full NHL pipeline: HTML PBP, shifts, TOI
    ├── analytics.py        # Advanced analytics (Corsi, scoring chances, zone starts)
    └── scrapers/           # Modular per-endpoint scrapers

Contributing

Bug reports and pull requests are welcome at https://github.com/maxtixador/scrapernhl.

License

MIT

Author

Max Tixador @woumaxx · @HabsBrain.com · maxtixador@gmail.com

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.2

Mar 7, 2026

This version

0.3.1

Mar 7, 2026

0.1.4

Jan 1, 2026

0.1.3

Dec 18, 2025

0.1.2

Dec 18, 2025

0.1.1

Dec 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapernhl-0.3.1.tar.gz (146.2 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapernhl-0.3.1-py3-none-any.whl (134.6 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file scrapernhl-0.3.1.tar.gz.

File metadata

Download URL: scrapernhl-0.3.1.tar.gz
Upload date: Mar 7, 2026
Size: 146.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapernhl-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`f64f4b13faf6f02b22427f89f678685fc69da6edc3846bb7298017723da390d2`
MD5	`bb75b7d225a20a5f7a085fb45d47435f`
BLAKE2b-256	`6b4e44abd12ed6b3474b5ec438e38c97dfc6f1afb0a91db785f017916df3e7eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapernhl-0.3.1.tar.gz:

Publisher: python-publish.yml on maxtixador/scrapernhl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapernhl-0.3.1.tar.gz
- Subject digest: f64f4b13faf6f02b22427f89f678685fc69da6edc3846bb7298017723da390d2
- Sigstore transparency entry: 1055798704
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: maxtixador/scrapernhl@a5b310825b9b0de1d804c8a7dafb75fcbeca6a51
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/maxtixador
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a5b310825b9b0de1d804c8a7dafb75fcbeca6a51
- Trigger Event: release

File details

Details for the file scrapernhl-0.3.1-py3-none-any.whl.

File metadata

Download URL: scrapernhl-0.3.1-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 134.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapernhl-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9930686b1f4cf01c845772adc1a6aa35685f445ed97834ab7fa5be087c60711`
MD5	`5ae9a6325a7a50e904ffe046911bd89e`
BLAKE2b-256	`dd9dda156ad69a5f96cbeb16e563ab9e8f8aa93e19c868f8743e2bae2cf91253`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapernhl-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on maxtixador/scrapernhl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapernhl-0.3.1-py3-none-any.whl
- Subject digest: b9930686b1f4cf01c845772adc1a6aa35685f445ed97834ab7fa5be087c60711
- Sigstore transparency entry: 1055798771
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: maxtixador/scrapernhl@a5b310825b9b0de1d804c8a7dafb75fcbeca6a51
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/maxtixador
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a5b310825b9b0de1d804c8a7dafb75fcbeca6a51
- Trigger Event: release

scrapernhl 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ScraperNHL

Supported Leagues

Installation

Two Ways to Use It

1. Functional API — one-liners for everything

2. Object-Oriented API — more control

League Metadata (non-NHL)

NHL-Specific Methods

Play-by-Play Sources

Shifts, Stats, Standings

Teams and Draft

NHL Analytics Pipeline

Command-Line Interface

Important Behavior Notes

Running Tests

Project Structure

Contributing

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance