Skip to main content

A Basketball Reference client that generates data by scraping the website

Project description

Basketball Reference Web Scraper

PyPI PyPI - Python Version PyPI - License codecov GitHub Actions - Default Branch

Basketball Reference is a great site (especially for a basketball stats nut like me), and hopefully they don't get too pissed off at me for creating this.

Basically, I created this repository as a utility for another project where I'm trying to estimate an NBA player's productivity as it relates to daily fantasy sports. For that project, I need box score and scheduling information, which is provided by this utility.

Here's the PyPi package.

Installing via pip

I wrote this library as an exercise for creating my first PyPi package.

Hopefully this means that if you'd like to use this library, you can by simply downloading the package via pip like so

pip install basketball_reference_web_scraper

This library requires Python 3.4+ and only supports seasons after the 1999-2000 season

Client

You can import the client like this

# This imports the client
from basketball_reference_web_scraper import client

There are also a couple useful enums that are defined in the data module which can be imported like

# This imports the Team enum
from basketball_reference_web_scraper.data import Team

API

This client has seven methods

  • Getting player box scores by a date (client.player_box_scores)
  • Getting team box scores by a date (client.team_box_scores)
  • Getting the schedule for a season (client.season_schedule)
  • Getting players totals for a season (client.players_season_totals)
  • Getting players advanced season statistics for a season (client.players_advanced_season_totals)
  • Getting regular season box scores for a given player and season (client.regular_season_player_box_scores)
  • Searching (client.search)

Data output

This client also supports three output types:

  • Python data types (i.e. a list or results)
  • JSON
  • CSV

Versions >=3 of this client outputs CSV to a specified file path and returns JSON output or writes it to a specified file path.

  • Specify an output type by setting the output_type value to OutputType.JSON or OutputType.CSV
    • The default return value of client methods are Python data structures (the box_scores method returns a list of dicts)
  • If you'd like the output to be outputted to a specific file, set the output_file_path variable - for CSV output, this variable must be defined
  • Specifying an output_write_option specifies how the output will be written to the specified file (OutputWriteOption.WRITE corresponds to w)
    • The default write option is OutputWriteOption.WRITE

Data parsing

  • Some pieces of data, like a player's team or the outcome of a game are parsed into enums (for example, the Team and Outcome enums, respectively, for the previous two examples)
  • These enums are serialized to strings when outputting to JSON or CSV, but when dealing with Python data structures, you'll see these enum values.
    • Hopefully, these enums make it easier for the client user to implement team-specific logic, for example.

Get player box scores by date

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all player box scores for January 1st, 2017 
client.player_box_scores(day=1, month=1, year=2017)

# Get all player box scores for January 1st, 2017 in JSON format
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.JSON)

# Output all player box scores for January 1st, 2017 in JSON format to 1_1_2017_box_scores.json
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.JSON, output_file_path="./1_1_2017_box_scores.json")

# Output all player box scores for January 1st, 2017 in JSON format to 1_1_2017_box_scores.csv
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.CSV, output_file_path="./1_1_2017_box_scores.csv")

Get team box scores by date

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all team box scores for January 1st, 2018 
client.team_box_scores(day=1, month=1, year=2018)

# Get all team box scores for January 1st, 2018 in JSON format
client.team_box_scores(day=1, month=1, year=2018, output_type=OutputType.JSON)

# Output all team box scores for January 1st, 2018 in JSON format to 1_1_2018_box_scores.json
client.team_box_scores(day=1, month=1, year=2018, output_type=OutputType.JSON, output_file_path="./1_1_2018_box_scores.json")

# Output all team box scores for January 1st, 2018 in JSON format to 1_1_2018_box_scores.csv
client.team_box_scores(day=1, month=1, year=2018, output_type=OutputType.CSV, output_file_path="./1_1_2018_box_scores.csv")

Get season schedule

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all games for the 2017-2018 season
client.season_schedule(season_end_year=2018)

# Get all games for the 2017-2018 season and output in JSON format
client.season_schedule(season_end_year=2018, output_type=OutputType.JSON)

# Output all games for the 2017-2018 season in CSV format to 2017_2018_season.csv
client.season_schedule(season_end_year=2018, output_type=OutputType.JSON, output_file_path="./2017_2018_season.json")

# Output all games for the 2017-2018 season in CSV format to 2017_2018_season.csv
client.season_schedule(season_end_year=2018, output_type=OutputType.CSV, output_file_path="./2017_2018_season.csv")

Get season totals for all players

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all player season totals for the 2017-2018 season
client.players_season_totals(season_end_year=2018)

# Get all player season totals for the 2017-2018 season and output in JSON format
client.players_season_totals(season_end_year=2018, output_type=OutputType.JSON)

# Output all player season totals for the 2017-2018 season in JSON format to 2017_2018_player_season_totals.json
client.players_season_totals(season_end_year=2018, output_type=OutputType.JSON, output_file_path="./2017_2018_player_season_totals.json")

# Output all player season totals for the 2017-2018 season in CSV format to 2017_2018_player_season_totals.csv
client.players_season_totals(season_end_year=2018, output_type=OutputType.CSV, output_file_path="./2017_2018_player_season_totals.csv")

Get advanced season statistics for all players

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all advanced player season totals for the 2017-2018 season
client.players_advanced_season_totals(season_end_year=2018)

# Get all advanced player season totals for the 2017-2018 season and output in JSON format
client.players_advanced_season_totals(season_end_year=2018, output_type=OutputType.JSON)

# Output all advanced player season totals for the 2017-2018 season in JSON format to 2017_2018_player_season_totals.json
client.players_advanced_season_totals(season_end_year=2018, output_type=OutputType.JSON, output_file_path="./2017_2018_advanced_player_season_totals.json")

# Output all advanced player season totals for the 2017-2018 season in CSV format to 2017_2018_player_season_totals.csv
client.players_advanced_season_totals(season_end_year=2018, output_type=OutputType.CSV, output_file_path="./2017_2018_advanced_player_season_totals.csv")

Get play-by-play data for a game

The structure of the API is due to the unique URL pattern that Basketball Reference has for getting play-by-play data, which depends on the date of the game and the home team.

Example: https://www.basketball-reference.com/boxscores/pbp/201810160BOS.html

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType, Team

# Get play-by-play for Boston Celtics game on October 16th, 2018
client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16)

# Get play-by-play for Boston Celtics game on October 16th, 2018 and output in JSON format
client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16, output_type=OutputType.JSON)

# Get play-by-play for Boston Celtics game on October 16th, 2018  in JSON format to 2018_10_06_BOS_PBP.json
client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16, output_type=OutputType.JSON, output_file_path="./2018_10_06_BOS_PBP.json")

# Output all advanced player season totals for the 2017-2018 season in CSV format to 2018_10_06_BOS_PBP.csv
client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16, output_type=OutputType.CSV, output_file_path="./2018_10_06_BOS_PBP.csv")

Get regular season box scores for a player

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all 2017-2018 regular season player box scores for Russell Westbrook
client.regular_season_player_box_scores(player_identifier="westbru01", season_end_year=2018)

# Get all 2017-2018 regular season player box scores for Russell Westbrook in JSON format
client.regular_season_player_box_scores(player_identifier="westbru01", season_end_year=2018, output_type=OutputType.JSON)

# Output all 2017-2018 regular season player box scores for Russell Westbrook in JSON format to 2017_2018_russell_westbrook_regular_season_box_scores.json
client.regular_season_player_box_scores(player_identifier="westbru01", season_end_year=2018, output_type=OutputType.JSON, output_file_path="./2017_2018_russell_westbrook_regular_season_box_scores.json")

# Output all 2017-2018 regular season player box scores for Russell Westbrook in CSV format to 2017_2018_russell_westbrook_regular_season_box_scores.csv
client.regular_season_player_box_scores(player_identifier="westbru01", season_end_year=2018, output_type=OutputType.CSV, output_file_path="./2017_2018_russell_westbrook_regular_season_box_scores.csv")

The player_identifier is Basketball Reference's unique identifier for each player. In the case of Russell Westbrook, his player_identifier is westbru01 (you can see this from his player page URL: https://www.basketball-reference.com/players/w/westbru01/gamelog/2020)

Search

from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType

# Get all results that match "Ko"
client.search(term="Ko")

# Get all results that match "Ko" and output in JSON format
client.search(term="Ko", output_type=OutputType.JSON)

# Output all results that match "Ko" in JSON format to ko_search.json
client.search(term="Ko", output_type=OutputType.JSON, output_file_path="./ko_search.json")

# Output all results that match "Ko" in CSV format to ko_search.csv
client.search(term="Ko", output_type=OutputType.CSV, output_file_path="./ko_search.csv")

Development

There are currently two supported major versions - V3 and V4.

There are two branches, v3 and v4 for both of these major versions - these are the defacto "master" branches to use when making changes.

master will reflect the latest major version branch.

Contributors

Thanks to @DaiJunyan, @ecallahan5, @Yotamho, and @ntsirakis for their contributions!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

File details

Details for the file basketball_reference_web_scraper-4.9.3.tar.gz.

File metadata

  • Download URL: basketball_reference_web_scraper-4.9.3.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.20.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.6

File hashes

Hashes for basketball_reference_web_scraper-4.9.3.tar.gz
Algorithm Hash digest
SHA256 01f17565125b01caef9c480e6f3b31fb67b3f2fe7719013d924d68a8defad956
MD5 c994015c588c095fec5da4c84d8ad1fe
BLAKE2b-256 265a330c2de72da7d4a91198756ce09fea995f8772f4104d1384e8d317f39783

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page