Python package for Basketball Reference that gathers data by scraping the website
Project description
basketball-reference-webscrapper
basketball-reference-webscrapper is a Python package designed to web scrape NBA games data from the Basketball Reference website.
Features
- Web scrapes NBA gamelogs, schedules, and player attributes.
- Validates user inputs to ensure data accuracy.
- Handles team-specific data filtering.
- Collects and processes data into a pandas DataFrame.
Installation
To install basketball-reference-webscrapper, clone the repository and install the required dependencies:
pip install basketball-reference-webscrapper
Usage
Importing the Package
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
Creating a FeatureIn Object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog', # 'gamelog', 'schedule', or 'player_attributes'
team='all' # 'all' or a list of team abbreviations e.g., ['BOS', 'LAL']
)
Scraping Data
scraper = WebScrapBasketballReference(feature_object)
data = scraper.webscrappe_nba_games_data()
print(data)
Input Validation
The package performs several input validations:
- Data Type Validation: Ensures
data_type
is one of'gamelog'
,'schedule'
, or'player_attributes'
. - Season Validation: Ensures
season
is an integer between 2000 and the current NBA season. - Team List Validation: Ensures
team
is either'all'
or a list of valid NBA team abbreviations.
Configuration
The package uses a params.yaml
file to store URL patterns and other configurations.
Example
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
# Define the feature object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog',
team='BOS' # Example team abbreviation for Boston Celtics
)
# Create the scraper instance
scraper = WebScrapBasketballReference(feature_object)
# Scrape the data
data = scraper.webscrappe_nba_games_data()
# Display the data
print(data.head())
Contributing
Contributions are welcome! Please submit a pull request or create an issue to report bugs or request features.
Contact
For any questions or feedback, please contact [yannick.flores1992@gmail.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for basketball_reference_webscrapper-0.4.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4430dc14c0e9a4348e8c4310d715fce199fc2a4b874a9ce59355117917674b15 |
|
MD5 | 9460c13e694a17c5ec1e2343b04d5682 |
|
BLAKE2b-256 | bfffdd45413a2fb1983c3c294608009bc7114ad8806654e40f76facaca61a7cf |
Hashes for basketball_reference_webscrapper-0.4.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d91bf13d16101bb610f16c94ee65362e88b900e777da1687cf969a1768790bf |
|
MD5 | d99959345128e13aaaf6812ae91c2935 |
|
BLAKE2b-256 | 70da0c312e747964d2a4f40c66e8603a87a7914d5a1b00cf1b2a6e4c70b95a19 |