Skip to main content

A python client to query the various stats.nba.com resources

Project description

nba_dataloader

Project Description

A python client to query the various stats.nba.com resources and download data into delta lake tables for further analysis. I started this project to mainly explore and get familiar with a number of different technologies mainly:-

  • Ray.io
  • Delta Lake
  • Advanced Python

A simple client capable of querying various stats.nba.com endpoints and storing the data onto disk as Delta/Parquet tables. The client can query a given endpoint with multiple different query parameters in parallel, using Ray tasks. Documentation of the various endpoints can be found here

PyPi project

Installation

pip install nba-dataloader

Usage

Also see notebooks for more examples

Show help

py -m nba_dataloader --help
usage: __main__.py [-h] [--params PARAMS] [--partition_by PARTITION_BY] [--mode {overwrite,append,error,ignore}]
                   [--location LOCATION]
                   resource

Downloads data from stats.nba.com and persists on disk as delta tables

positional arguments:
  resource              Will make a request to --> https://stats.nba.com/stats/<endpoint>

options:
  -h, --help            show this help message and exit
  --params PARAMS       A python module containing variable 'params' for the query
  --partition_by PARTITION_BY
                        The column to partition by
  --mode {overwrite,append,error,ignore}
                        The write mode
  --location LOCATION   Location to write the fetched data, defaults to tmp/

Fetch data from resource endpoint

Lets try the resource CommonTeamYears, which takes only one parameter LeagueId

py -m nba_dataloader commonTeamYears --params my_request_params

The --params parameter accepts a python file or module that contains a variable param: list[dict] in the above example the contents of my_request_params is

params = [{
    "LeagueID":"00"
}]

The parameters specified in the file are used to make a web service request to https://stats.nba.com/commonTeamYears with the appropriate request headers. The response json is parsed and converted into a delta table that is stored in the folder tmp/ of the directory in which the script was run. This default location can be overridden using the ---location command line parameter.

Fetch data for multiple seasons from the resource "leaguedashplayerstats"

Note that params is an array of dict, if multiple dicts are provided each of the dict objects corresponds to a separate web service request and each response is appended to a single delta table.

py -m nba_dataloader leaguedashplayerstats --params multiple_season_params

Contents of multiple_season_params.py is:

base_params_dict = {
        "LastNGames": 0,
        "LeagueID": "00",
        "MeasureType": "Base",
        "Month": 0,
        "OpponentTeamID": 0,
        "PORound": 0,
        "PaceAdjust": "N",
        "PerMode": "Totals",
        "Period": 0,
        "PlusMinus": "N",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "TeamID": 0
}
seasons = {'1996-97', '1997-98', '1998-99', '1999-00', '2000-01', '2001-02', '2002-03', '2003-04', '2004-05',
           '2005-06', '2006-07', '2007-08', '2008-09', '2009-10', '2010-11', '2011-12', '2012-13', '2013-14',
           '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23'}
params = map(lambda season: {'Season': season} | base_params_dict, seasons)

In the above code params is a list of dicts constructed by adding a new attribute 'Season:' to the base_params_dict for each of the seasons and building a list of dicts

The resulting value of params is

params = [
{
        "LastNGames": 0,
        "LeagueID": "00",
        "MeasureType": "Base",
        "Month": 0,
        "OpponentTeamID": 0,
        "PORound": 0,
        "PaceAdjust": "N",
        "PerMode": "Totals",
        "Period": 0,
        "PlusMinus": "N",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "TeamID": 0,
        "Season": "1996-97" # <--- Note the new attribute season
},
{
        "LastNGames": 0,
        "LeagueID": "00",
        "MeasureType": "Base",
        "Month": 0,
        "OpponentTeamID": 0,
        "PORound": 0,
        "PaceAdjust": "N",
        "PerMode": "Totals",
        "Period": 0,
        "PlusMinus": "N",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "TeamID": 0,
        "Season":"1997-98"
}
{},{}..
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nba_dataloader-1.0.5.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nba_dataloader-1.0.5-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file nba_dataloader-1.0.5.tar.gz.

File metadata

  • Download URL: nba_dataloader-1.0.5.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for nba_dataloader-1.0.5.tar.gz
Algorithm Hash digest
SHA256 758a24de444b4db373155e0dc26145bc4c36a00c7b35f441efdf964442ab519f
MD5 36051f793ac5b56d2c92b51b2ea1bd40
BLAKE2b-256 6053cb7bd35d314382daf507863bf75abfb1e943728b7021ad2463871169e312

See more details on using hashes here.

File details

Details for the file nba_dataloader-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: nba_dataloader-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for nba_dataloader-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 718f13fc925f671da19f29200f62e9860d0676adf269177841956fdc96fbd898
MD5 6384ac97976486a936692adf78ea80ab
BLAKE2b-256 24eeae47299c6a1e15e8837572c66929c53949dd77a1169bd64806eeee4069ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page