Python package for Basketball Reference that gathers data by scraping the website
Project description
basketball-reference-webscrapper
basketball-reference-webscrapper is a Python package designed to web scrape NBA games data from the Basketball Reference website.
Features
- Web scrapes NBA gamelogs, schedules, and player attributes.
- Validates user inputs to ensure data accuracy.
- Handles team-specific data filtering.
- Collects and processes data into a pandas DataFrame.
Installation
To install basketball-reference-webscrapper, clone the repository and install the required dependencies:
pip install basketball-reference-webscrapper
Usage
Importing the Package
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
Creating a FeatureIn Object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog', # 'gamelog', 'schedule', or 'player_attributes'
team='all' # 'all' or a list of team abbreviations e.g., ['BOS', 'LAL']
)
Scraping Data
scraper = WebScrapBasketballReference(feature_object)
data = scraper.webscrappe_nba_games_data()
print(data)
Input Validation
The package performs several input validations:
- Data Type Validation: Ensures
data_type
is one of'gamelog'
,'schedule'
, or'player_attributes'
. - Season Validation: Ensures
season
is an integer between 2000 and the current NBA season. - Team List Validation: Ensures
team
is either'all'
or a list of valid NBA team abbreviations.
Configuration
The package uses a params.yaml
file to store URL patterns and other configurations.
Example
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
# Define the feature object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog',
team='BOS' # Example team abbreviation for Boston Celtics
)
# Create the scraper instance
scraper = WebScrapBasketballReference(feature_object)
# Scrape the data
data = scraper.webscrappe_nba_games_data()
# Display the data
print(data.head())
Contributing
Contributions are welcome! Please submit a pull request or create an issue to report bugs or request features.
Contact
For any questions or feedback, please contact [yannick.flores1992@gmail.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for basketball_reference_webscrapper-0.4.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f23bfacb2e8a9a03e6649943789cd8d2b9b09d061981de8abbb0a12356e9b02 |
|
MD5 | 6b16ccf7b6446b9736c4dd08f0322863 |
|
BLAKE2b-256 | 1d1e33800eea4cf58a3ecd2d7cc166915b6a3f3d58e3def782b15b3da298bd62 |
Hashes for basketball_reference_webscrapper-0.4.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd0670957db8fe7d94fee720744d84e8db29d96f55ade54b8acc0923ae9b5101 |
|
MD5 | 276c52d16aef624a51b22ae3c1cb53cd |
|
BLAKE2b-256 | fa9fbbceea5394dc507fedfa5604668867add63538fd5e1a66a84c4267bbd93b |