A Python-based web scraper for NCAA basketball.
Project description
CBBpy: A Python-based web scraper for NCAA basketball
Purpose
This package is designed to bridge the gap between data and analysis for NCAA D1 basketball. CBBpy can grab play-by-play, boxscore, and other game metadata for any NCAA D1 men's basketball game.
Installation and import
CBBpy requires Python >= 3.9 as well as the following packages:
- pandas>=1.4.2
- numpy>=1.22.3
- python-dateutil>=2.8.2
- pytz>=2022.1
- tqdm>=4.63.0
Install using pip:
pip install cbbpy
As of now, CBBpy only offers a men's basketball scraper, which can be imported as such:
import cbbpy.mens_scraper as ms
Functions available in CBBpy
NOTE: game ID, as far as CBBpy is concernced, is a valid ESPN game ID
ms.get_game_info(game_id: str) grabs all the metadata (game date, time, score, teams, referees, etc) for a particular game.
ms.get_game_boxscore(game_id: str) returns a pandas DataFrame with each player's stats for a particular game.
ms.get_game_pbp(game_id: str) scrapes the play-by-play tables for a game and returns a pandas DataFrame, with each entry representing a play made during the game.
ms.get_game(game_id: str) gets all information about a game (game info, boxscore, PBP) and returns a tuple of results (game_info, boxscore, pbp)
ms.get_games_season(season: int) scrapes all game information for all games in a particular season. As an example, to scrape games for the 2020-21 season, call get_games_season(2021). Returns a tuple of 3 DataFrames, similar to get_game.
ms.get_game_ids(date: str) returns a list of all game IDs for a particular date.
Examples
Function call:
ms.get_game_info('401408636')
Returns:
| game_id | home_team | home_id | home_rank | home_record | home_score | away_team | away_id | away_rank | away_record | away_score | home_win | num_ots | tournament | game_day | game_time | game_loc | arena | arena_capacity | attendance | tv_network | referee_1 | referee_2 | referee_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 401408636 | Kansas Jayhawks | 2305 | 1 | 34-6 | 72 | North Carolina Tar Heels | 153 | 8 | 29-10 | 69 | True | 0 | Men's Basketball Championship - National Championship | April 04, 2022 | 06:20 PM PDT | New Orleans, LA | Caesars Superdome | nan | 69,423 | TBS | Ron Groover | Terry Oglesby | Jeff Anderson |
Function call:
ms.get_game_boxscore('401408636')
Returns (partially):
| game_id | team | player | player_id | position | starter | min | fgm | fga | 2pm | 2pa | 3pm | 3pa | ftm | fta | oreb | dreb | reb | ast | stl | blk | to | pf | pts | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 401408636 | Kansas Jayhawks | J. Wilson | 4431714 | F | True | 34 | 5 | 13 | 4 | 8 | 1 | 5 | 4 | 4 | 1 | 3 | 4 | 2 | 0 | 1 | 0 | 1 | 15 |
| 1 | 401408636 | Kansas Jayhawks | D. McCormack | 4397019 | F | True | 29 | 7 | 15 | 7 | 15 | 0 | 0 | 1 | 2 | 3 | 7 | 10 | 0 | 1 | 1 | 1 | 4 | 15 |
| 2 | 401408636 | Kansas Jayhawks | D. Harris | 4431983 | G | True | 27 | 1 | 5 | 1 | 4 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 1 | 4 | 0 | 2 |
| 3 | 401408636 | Kansas Jayhawks | C. Braun | 4431767 | G | True | 40 | 6 | 14 | 6 | 13 | 0 | 1 | 0 | 0 | 1 | 11 | 12 | 3 | 0 | 0 | 1 | 3 | 12 |
| 4 | 401408636 | Kansas Jayhawks | O. Agbaji | 4397018 | G | True | 37 | 4 | 9 | 3 | 5 | 1 | 4 | 3 | 8 | 1 | 2 | 3 | 1 | 1 | 1 | 2 | 1 | 12 |
Function call:
ms.get_game_pbp('401408636')
Returns (partially):
| game_id | home_team | away_team | play_team | home_score | away_score | half | secs_left_half | secs_left_reg | play_desc | play_type | scoring_play | shooter | is_assisted | assist_player | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 401408636 | Kansas Jayhawks | North Carolina Tar Heels | Kansas Jayhawks | 0 | 0 | 1 | 1200 | 2400 | Jump Ball won by Kansas | jump ball | False | False | ||
| 1 | 401408636 | Kansas Jayhawks | North Carolina Tar Heels | Kansas Jayhawks | 3 | 0 | 1 | 1179 | 2379 | Ochai Agbaji made Three Point Jumper. Assisted by Christian Braun. | jumper | True | Ochai Agbaji | True | Christian Braun |
| 2 | 401408636 | Kansas Jayhawks | North Carolina Tar Heels | North Carolina Tar Heels | 3 | 0 | 1 | 1161 | 2361 | Armando Bacot missed Jumper. | jumper | False | False | ||
| 3 | 401408636 | Kansas Jayhawks | North Carolina Tar Heels | Kansas Jayhawks | 3 | 0 | 1 | 1161 | 2361 | Christian Braun Defensive Rebound. | rebound | False | False | ||
| 4 | 401408636 | Kansas Jayhawks | North Carolina Tar Heels | Kansas Jayhawks | 5 | 0 | 1 | 1144 | 2344 | David McCormack made Jumper. Assisted by Dajuan Harris Jr.. | jumper | True | David McCormack | True | Dajuan Harris Jr |
Contact
Feel free to reach out to me directly with any questions, requests, or suggestions at dnlcowan37@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file CBBpy-1.0.0.tar.gz.
File metadata
- Download URL: CBBpy-1.0.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f2c48c710070886135753ea9a2785a56202bcc077c16fc3849e37b176ecb90e
|
|
| MD5 |
7471a067d20a0ac34aad3a72240e6789
|
|
| BLAKE2b-256 |
94fc50c02cae4105a08f6cc8eebf81e866f9a06a9d50a6c8ba73ed2bffea3f68
|
File details
Details for the file CBBpy-1.0.0-py3-none-any.whl.
File metadata
- Download URL: CBBpy-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f826ae4d0b7487834c92418df4b43903157b8d732de3cb6b0f55ab277d35a6f7
|
|
| MD5 |
c112d1209dd3bd4ceecbe377e681064e
|
|
| BLAKE2b-256 |
8deef996e1d48e4625aa638b7c999e979c2704c3613dd8ecc61d3c304f124041
|