This project is designed to allow people to scrape Play by Play and Shift data off of the National Hockey League (NHL) API and website for all regular season and playoff games since the 2010-2011 season
Project description
This project is designed to allow people to scrape Play by Play and Shift data off of the National Hockey League (NHL) API and website for all regular season and playoff games since the 2010-2011 season (further testing needs to be done to ensure it works for earlier seasons).
Prerequisites
You are going to need to have python installed for this. Specifically, you’ll need from at least version 3.6.0.
If you don’t have python installed on your machine, I’d recommend installing it through the anaconda distribution (here - https://www.continuum.io/downloads). Anaconda comes with a bunch of libraries pre-installed so it’ll be easier to start off.
How to Use
First just download this repository onto your computer.
Then open up the command line or terminal and navigate over to the folder which contains the code. Then type in “python” to open the interactive python console.
You then want to import the file in the folder which contains the functions for scraping the data. That file is called scrape_functions.py. So just type in (and press enter):
import scrape_functions
There are three relevant functions used for scraping data (After any scraping function finishes running, the data scraped can be found in the folder which contains your code):
1. scrape_seasons:
This function is used to scrape on a season by season level. It takes two arguments:
‘seasons’ - List of seasons you want to scrape (Note: A given season is referred to by the first of the two years it spans. So you would refer to the 2016-2017 season as 2016.
‘if_scrape_shifts’ - Boolean indicating whether or not you want to scrape the shifts too.
# Scrapes 2015 & 2016 season with shifts scrape_functions.scrape_seasons([2015, 2016], True) # Scrapes 2016 season without shifts scrape_functions.scrape_seasons([2016], False)
2. scrape_games:
This function is used to scrape any collection of games you want. It takes two arguments:
‘games’ - List of games you want to scrape. A game is identified by the game id used by the NHL (ex: 2016020001). The list of corresponding id’s for games can be found here (https://statsapi.web.nhl.com/api/v1/schedule?startDate=2016-10-12&endDate=2016-10-12 - Just fiddle with the start and end dates in the url to find the game you are looking for).
‘if_scrape_shifts’ - Boolean indicating whether or not you want to scrape the shifts too.
# Scrapes first game of 2014, 2015, and 2016 seasons with shifts scrape_functions.scrape_games([2014020001, 2015020001, 2016020001], True)
3. scrape_date_range:
This functions is used to scrape any games in a given date range. All dates must be written in the following format yyyy-mm-dd (ex: ‘2016-10-20’). It take three arguments:
‘from_date’ - Date of beginning of interval you want to scrape
‘to_date’ - Date of end of interval you want to scrape
‘if_scrape_shifts’ - Boolean indicating whether or not you want to scrape the shifts too.
# Scrapes games between 2016-10-10 and 2016-10-20 without shifts scrape_functions.scrape_date_range('2016-10-10', '2016-10-20', False)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hockey_scraper-1.tar.gz
.
File metadata
- Download URL: hockey_scraper-1.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 333532a4d0f9063e1fbad3e05dae7917efc9c5d28b7b64a0c1c1934550701ceb |
|
MD5 | 25598147a277f76ccb375bea651b090c |
|
BLAKE2b-256 | 09bd0cd33352a054df58ab40f3effe8cdb4752be60e8dc7c448f7a43c8d8cc42 |
File details
Details for the file hockey_scraper-1-py3-none-any.whl
.
File metadata
- Download URL: hockey_scraper-1-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8285be66f037ca21685a1d40c66646d3e6c13965a9b482f099d58c60d14a9ecc |
|
MD5 | 5c5f3965b6b028a3cc5ee8924c6125fe |
|
BLAKE2b-256 | bf2a9b5a55a5b0eae727560fad8fc31b31bd962b48ab1650b183cb0e0c70bd62 |