Skip to main content

A tool to scrape a basketball website for player, schedule, and game data. Legal implications apply. Please look at README.md for more information.

Project description

NBA Data Scraper

Introduction

NBA Data Scraper is a Python library designed to scrape game shots data from a specific basketball-related website (Basketball Reference). It is structured to handle requests efficiently and respectfully using rate limiting to avoid overloading the server (bot traffic). On that note, all use of data acquired should respect the website's terms of use.

๐Ÿ“‚ Structure

nba-data-scraper/
โ”‚
โ”œโ”€โ”€ nba-data-scraper/
โ”‚ โ”œโ”€โ”€ init.py
โ”‚ โ”œโ”€โ”€ utils/
โ”‚ โ”‚ โ”œโ”€โ”€ init.py
โ”‚ โ”‚ โ””โ”€โ”€ _logger.py
โ”‚ โ”œโ”€โ”€ _abstract.py
โ”‚ โ”œโ”€โ”€ _data_scraper.py
โ”‚ โ””โ”€โ”€ scraper.py
โ”‚
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ init.py
โ”‚ โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ docs/
โ”‚ โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ examples/
โ”‚ โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ setup.py

๐Ÿ”ง Installation (not available yet)

(Instructions on how to install the library, e.g., using pip or by cloning the repo)

pip install nba-data-scraper

Usage

Scrape Player Data

from nba_data_scraper import NBAScraper
nba_scraper = NBAScraper()

# Scrapes player data for the letter 'a'
player_data = nba_scraper.scrape_player_data('a')  

Scrape Schedule Data

# Scrapes games played in a specific year and month
schedule_data = nba_scraper.scrape_schedule_data(year='2023', month='january') 

# Scrapes games played in given list of years and months
schedule_data = nba_scraper.scrape_schedule_data(year=['2022','2023'], month=['january','february'])

# Scrapes all games played given a start and end year
schedule_data = nba_scraper.scrape_all_schedule_data(start_year=2020, end_year=2021)

Scrape Game Data

# Scrapes Game Data for games played within a schedule

## First: Scrape for schedule
schedule_data = nba_scraper.scrape_schedule_data(year='2023', month='january')

## Second: use return DataFrame as input to scrape_game_data method
game_data = nba_scraper.scrape_game_data(schedule_df=schedule_data)

Work in Progress

  • Further documentation in the docs/ folder.
  • Additional examples in the examples/ folder.
  • Comprehensive tests in the tests/ folder.

License

See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nba_data_scraper-1.0.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nba_data_scraper-1.0.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file nba_data_scraper-1.0.0.tar.gz.

File metadata

  • Download URL: nba_data_scraper-1.0.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.12 Windows/10

File hashes

Hashes for nba_data_scraper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e6ff2d836522b67874a823a3469a10dc07413eab8ad79476ca15d990b5c8f630
MD5 91f42d7b9dde62add86f52099f3e9959
BLAKE2b-256 8a8f717ef40abc1f8c3046ee715b5f2206116fbd8d4061c7db6f8d89d1fc9a4e

See more details on using hashes here.

File details

Details for the file nba_data_scraper-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nba_data_scraper-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.12 Windows/10

File hashes

Hashes for nba_data_scraper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87aff366a78b814d6438b88f7245185692f66fd7d16104486f9eaa8bbb780623
MD5 877c79c9fbd3c1a2e8aa4f58fa9a629d
BLAKE2b-256 0e825e3df96828fc3694bd2dc9d9f4d578231934cec881269ed196205e7ee6e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page