Skip to main content

The TopDownHockey Scraper

Project description

TopDownHockey EliteProspects Scraper

By Patrick Bacon, made possible by the work of Marcus Sjölin and Harry Shomer.


This is a package built for scraping two data sources:

  1. The NHL's Play-by-Play Reports, which come in the form of HTML/API reports from the NHL and JSON reports from ESPN.

  2. Elite Prospects, an extremely valuable website which makes hockey data for thousands of leagues available to the public.

This package is strictly built for end users who wish to scrape data for personal use. If you are interested in using Elite Prospects data for professional purposes, I recommend you look into the Elite Prospects API.

While using the scraper, please be mindful of EliteProspects servers.

Installation


You can install the package by entering the following command in terminal:

pip install TopDownHockey_Scraper

If you're interested in using the NHL Play-By-Play scraper, import that module using this function in Python:

import TopDownHockey_Scraper.TopDownHockey_NHL_Scraper as tdhnhlscrape

If you're interested in using the Elite Prospects scraper, import that module using this function in Python:

import TopDownHockey_Scraper.TopDownHockey_EliteProspects_Scraper as tdhepscrape

User-End Functions (NHL Scraper)


scrape_full_schedule(start_date, end_date)

Returns the NHL's schedule from the API for all games for the 2023-2024 NHL season.

Example:

tdhnhlscrape.scrape_full_schedule()


full_scrape(game_id_list, shift = True)

Returns a dataframe containing play-by-play data for a list of game ids.

  • game_id_list: A list of NHL game ids.

Example:

tdhnhlscrape.full_scrape([2023020179, 2023020180, 2023020181])

User-End Functions (Elite Prospects Scraper)


get_skaters(leagues, seasons)

Returns a dataframe containing statistics for all skaters in a target set of league(s) and season(s).

  • leagues: One or multiple leagues. If one league, enter as a string i.e; "nhl". If multiple leagues, enter as a tuple or list i.e; ("nhl", "ahl").
  • seasons: One or multiple leagues. If one league, enter as a string i.e; "2018-2019". If multiple leagues, enter as a tuple or list i.e; ("2018-2019", "2019-2020").

Example:

tdhepscrape.get_skaters(("nhl", "ahl"), ("2018-2019", "2019-2020"))


get_goalies(leagues, seasons)

Returns a dataframe containing statistics for all goalies in a target set of league(s) and season(s).

  • leagues: One or multiple leagues. If one league, enter as a string i.e; "nhl". If multiple leagues, enter as a tuple or list i.e; ("nhl", "ahl").
  • seasons: One or multiple leagues. If one league, enter as a string i.e; "2018-2019". If multiple leagues, enter as a tuple or list i.e; ("2018-2019", "2019-2020").

Example:

tdhepscrape.get_goalies("khl", "2015-2016")


get_player_information(dataframe)

Returns a dataframe containing bio information for all skaters or goalies (or both) within a target dataframe.

  • dataframe: The dataframe returned by one of the previous two commands.

Example:

Say you obtain skater data for the KHL in 2020-2021 and store that as a dataframe called output. You can run this function to get bio information for every player in that league's scrape.

output = tdhepscrape.get_skaters("khl", "2020-2021")

tdhepscrape.get_player_information(output)


add_player_information(dataframe)

Returns a dataframe containing bio information for all skaters or goalies (or both) within a target dataframe as well as the statistics from the original dataframe.

  • dataframe: The dataframe returned by one of the previous two commands.

Example:

Say you obtain skater data for the KHL in 2020-2021 and store that as a dataframe called output. You can run this function to get bio information for every player in that league's scrape.

output = tdhepscrape.get_skaters("khl", "2020-2021")

tdhepscrape.add_player_information(output)

Comments, Questions, and Concerns.


My goal was to make this package as error-proof as possible. I believe I've accounted for every issue that could potentially throw off a scrape, but it's possible I've missed something.

If any issues arise, or you have any questions about the package, please do not hesitate to contact me on Twitter at @TopDownHockey or email me directly at patrick.s.bacon@gmail.com.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topdownhockey_scraper-6.1.63.tar.gz (212.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

topdownhockey_scraper-6.1.63-py3-none-any.whl (215.8 kB view details)

Uploaded Python 3

File details

Details for the file topdownhockey_scraper-6.1.63.tar.gz.

File metadata

  • Download URL: topdownhockey_scraper-6.1.63.tar.gz
  • Upload date:
  • Size: 212.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for topdownhockey_scraper-6.1.63.tar.gz
Algorithm Hash digest
SHA256 5074e3e265fd4e5fcb8be20a28d2012ff23d72559b9aa4c1a9b4ef48346f5bde
MD5 fe47deda108d54eb151e32124e026411
BLAKE2b-256 e796318bc1061b5f9558a5da179ed432d536ac15fac98d28d009c90cc9111582

See more details on using hashes here.

File details

Details for the file topdownhockey_scraper-6.1.63-py3-none-any.whl.

File metadata

File hashes

Hashes for topdownhockey_scraper-6.1.63-py3-none-any.whl
Algorithm Hash digest
SHA256 6acc1f564c53ace781b84cf230703a98287c9072c68a42738c0a7fb080f44679
MD5 4213fd47add3c98e7d21e3c87ef02755
BLAKE2b-256 e74cd7a83bb52a09a2e76fe7a69f052346d53858a3a32975d05e246093ece64c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page