Skip to main content

The TopDownHockey Scraper

Project description

TopDownHockey EliteProspects Scraper

By Patrick Bacon, made possible by the work of Marcus Sjölin and Harry Shomer.


This is a package built for scraping two data sources:

  1. The NHL's Play-by-Play Reports, which come in the form of HTML/API reports from the NHL and JSON reports from ESPN.

  2. Elite Prospects, an extremely valuable website which makes hockey data for thousands of leagues available to the public.

This package is strictly built for end users who wish to scrape data for personal use. If you are interested in using Elite Prospects data for professional purposes, I recommend you look into the Elite Prospects API.

While using the scraper, please be mindful of EliteProspects servers.

Installation


You can install the package by entering the following command in terminal:

pip install TopDownHockey_Scraper

If you're interested in using the NHL Play-By-Play scraper, import that module using this function in Python:

import TopDownHockey_Scraper.TopDownHockey_NHL_Scraper as tdhnhlscrape

If you're interested in using the Elite Prospects scraper, import that module using this function in Python:

import TopDownHockey_Scraper.TopDownHockey_EliteProspects_Scraper as tdhepscrape

User-End Functions (NHL Scraper)


scrape_full_schedule(start_date, end_date)

Returns the NHL's schedule from the API for all games for the 2023-2024 NHL season.

Example:

tdhnhlscrape.scrape_full_schedule()


full_scrape(game_id_list, shift = True)

Returns a dataframe containing play-by-play data for a list of game ids.

  • game_id_list: A list of NHL game ids.

Example:

tdhnhlscrape.full_scrape([2023020179, 2023020180, 2023020181])

User-End Functions (Elite Prospects Scraper)


get_skaters(leagues, seasons)

Returns a dataframe containing statistics for all skaters in a target set of league(s) and season(s).

  • leagues: One or multiple leagues. If one league, enter as a string i.e; "nhl". If multiple leagues, enter as a tuple or list i.e; ("nhl", "ahl").
  • seasons: One or multiple leagues. If one league, enter as a string i.e; "2018-2019". If multiple leagues, enter as a tuple or list i.e; ("2018-2019", "2019-2020").

Example:

tdhepscrape.get_skaters(("nhl", "ahl"), ("2018-2019", "2019-2020"))


get_goalies(leagues, seasons)

Returns a dataframe containing statistics for all goalies in a target set of league(s) and season(s).

  • leagues: One or multiple leagues. If one league, enter as a string i.e; "nhl". If multiple leagues, enter as a tuple or list i.e; ("nhl", "ahl").
  • seasons: One or multiple leagues. If one league, enter as a string i.e; "2018-2019". If multiple leagues, enter as a tuple or list i.e; ("2018-2019", "2019-2020").

Example:

tdhepscrape.get_goalies("khl", "2015-2016")


get_player_information(dataframe)

Returns a dataframe containing bio information for all skaters or goalies (or both) within a target dataframe.

  • dataframe: The dataframe returned by one of the previous two commands.

Example:

Say you obtain skater data for the KHL in 2020-2021 and store that as a dataframe called output. You can run this function to get bio information for every player in that league's scrape.

output = tdhepscrape.get_skaters("khl", "2020-2021")

tdhepscrape.get_player_information(output)


add_player_information(dataframe)

Returns a dataframe containing bio information for all skaters or goalies (or both) within a target dataframe as well as the statistics from the original dataframe.

  • dataframe: The dataframe returned by one of the previous two commands.

Example:

Say you obtain skater data for the KHL in 2020-2021 and store that as a dataframe called output. You can run this function to get bio information for every player in that league's scrape.

output = tdhepscrape.get_skaters("khl", "2020-2021")

tdhepscrape.add_player_information(output)

Comments, Questions, and Concerns.


My goal was to make this package as error-proof as possible. I believe I've accounted for every issue that could potentially throw off a scrape, but it's possible I've missed something.

If any issues arise, or you have any questions about the package, please do not hesitate to contact me on Twitter at @TopDownHockey or email me directly at patrick.s.bacon@gmail.com.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topdownhockey_scraper-6.1.54.tar.gz (211.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

topdownhockey_scraper-6.1.54-py3-none-any.whl (215.0 kB view details)

Uploaded Python 3

File details

Details for the file topdownhockey_scraper-6.1.54.tar.gz.

File metadata

  • Download URL: topdownhockey_scraper-6.1.54.tar.gz
  • Upload date:
  • Size: 211.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for topdownhockey_scraper-6.1.54.tar.gz
Algorithm Hash digest
SHA256 04b30f8e66c4bc89b21e89a27fd7f762945b6f5be5d5d37bf30889697555c85d
MD5 db028936a6c0525550678c54dae72ae0
BLAKE2b-256 1c11cf43708e582b8417600f82837f07c383898e80baba19a48d4068e71ca26f

See more details on using hashes here.

File details

Details for the file topdownhockey_scraper-6.1.54-py3-none-any.whl.

File metadata

File hashes

Hashes for topdownhockey_scraper-6.1.54-py3-none-any.whl
Algorithm Hash digest
SHA256 9d0eddb433df77cde1000627739f9891b7b22284f62ebb6b2aba3b09f92a547e
MD5 ce43b5fa5a6c00a60dcc644232210ed5
BLAKE2b-256 e6ab425a305b5266a57c9b16e122d92f85a986ea7ad4af55a2872c3c4e706065

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page