A python client to query the various stats.nba.com resources
Project description
nba_dataloader
Project Description
A python client to query the various stats.nba.com resources and download data into delta lake tables for further analysis. I started this project to mainly explore and get familiar with a number of different technologies mainly:-
- Ray.io
- Delta Lake
- Advanced Python
A simple client capable of querying various stats.nba.com endpoints and storing the data onto disk as Delta/Parquet tables. The client can query a given endpoint with multiple different query parameters in parallel, using Ray tasks. Documentation of the various endpoints can be found here
PyPi project
Installation
pip install nba-dataloader
Usage
Also see notebooks for more examples
Show help
py -m nba_dataloader --help
usage: __main__.py [-h] [--params PARAMS] [--partition_by PARTITION_BY] [--mode {overwrite,append,error,ignore}]
[--location LOCATION]
resource
Downloads data from stats.nba.com and persists on disk as delta tables
positional arguments:
resource Will make a request to --> https://stats.nba.com/stats/<endpoint>
options:
-h, --help show this help message and exit
--params PARAMS A python module containing variable 'params' for the query
--partition_by PARTITION_BY
The column to partition by
--mode {overwrite,append,error,ignore}
The write mode
--location LOCATION Location to write the fetched data, defaults to tmp/
Fetch data from resource endpoint
Lets try the resource CommonTeamYears, which takes only one parameter LeagueId
py -m nba_dataloader commonTeamYears --params my_request_params
The --params
parameter accepts a python file or module that contains a variable param: list[dict]
in the above example the contents of my_request_params
is
params = [{
"LeagueID":"00"
}]
The parameters specified in the file are used to make a web service request to https://stats.nba.com/commonTeamYears
with the appropriate request headers. The response json is parsed and converted into a delta table that is stored in the folder tmp/
of the directory in which the script was run. This default location can be overridden using the ---location
command line parameter.
Fetch data for multiple seasons from the resource "leaguedashplayerstats"
Note that params is an array of dict, if multiple dicts are provided each of the dict objects corresponds to a separate web service request and each response is appended to a single delta table.
py -m nba_dataloader leaguedashplayerstats --params multiple_season_params
Contents of multiple_season_params.py
is:
base_params_dict = {
"LastNGames": 0,
"LeagueID": "00",
"MeasureType": "Base",
"Month": 0,
"OpponentTeamID": 0,
"PORound": 0,
"PaceAdjust": "N",
"PerMode": "Totals",
"Period": 0,
"PlusMinus": "N",
"Rank": "N",
"SeasonType": "Regular Season",
"TeamID": 0
}
seasons = {'1996-97', '1997-98', '1998-99', '1999-00', '2000-01', '2001-02', '2002-03', '2003-04', '2004-05',
'2005-06', '2006-07', '2007-08', '2008-09', '2009-10', '2010-11', '2011-12', '2012-13', '2013-14',
'2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23'}
params = map(lambda season: {'Season': season} | base_params_dict, seasons)
In the above code params is a list of dicts constructed by adding a new attribute 'Season:' to the base_params_dict
for each of the seasons and building a list of dicts
The resulting value of params
is
params = [
{
"LastNGames": 0,
"LeagueID": "00",
"MeasureType": "Base",
"Month": 0,
"OpponentTeamID": 0,
"PORound": 0,
"PaceAdjust": "N",
"PerMode": "Totals",
"Period": 0,
"PlusMinus": "N",
"Rank": "N",
"SeasonType": "Regular Season",
"TeamID": 0,
"Season": "1996-97" # <--- Note the new attribute season
},
{
"LastNGames": 0,
"LeagueID": "00",
"MeasureType": "Base",
"Month": 0,
"OpponentTeamID": 0,
"PORound": 0,
"PaceAdjust": "N",
"PerMode": "Totals",
"Period": 0,
"PlusMinus": "N",
"Rank": "N",
"SeasonType": "Regular Season",
"TeamID": 0,
"Season":"1997-98"
}
{},{}..
]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nba_dataloader-1.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 718f13fc925f671da19f29200f62e9860d0676adf269177841956fdc96fbd898 |
|
MD5 | 6384ac97976486a936692adf78ea80ab |
|
BLAKE2b-256 | 24eeae47299c6a1e15e8837572c66929c53949dd77a1169bd64806eeee4069ef |