Python package for Basketball Reference that gathers data by scraping the website
Project description
WebScrapBasketballReference
WebScrapBasketballReference is a Python package designed to scrape NBA games data from the Basketball Reference website. By providing a URL and a season, the package returns all games' team stats for that season.
Features
- Scrapes NBA gamelogs, schedules, and player attributes.
- Validates user inputs to ensure data accuracy.
- Handles team-specific data filtering.
- Collects and processes data into a pandas DataFrame.
Installation
To install WebScrapBasketballReference, clone the repository and install the required dependencies:
git clone https://github.com/yourusername/WebScrapBasketballReference.git
cd WebScrapBasketballReference
pip install -r requirements.txt
Usage
Importing the Package
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
Creating a FeatureIn Object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog', # 'gamelog', 'schedule', or 'player_attributes'
team='all' # 'all' or a list of team abbreviations e.g., ['BOS', 'LAL']
)
Scraping Data
scraper = WebScrapBasketballReference(feature_object)
data = scraper.webscrappe_nba_games_data()
print(data)
Input Validation
The package performs several input validations:
- Data Type Validation: Ensures
data_type
is one of'gamelog'
,'schedule'
, or'player_attributes'
. - Season Validation: Ensures
season
is an integer between 2000 and the current NBA season. - Team List Validation: Ensures
team
is either'all'
or a list of valid NBA team abbreviations.
Configuration
The package uses a params.yaml
file to store URL patterns and other configurations. Ensure this file is correctly set up in the basketball_reference_webscrapper
directory.
Logging
Logging is configured to provide information about the scraping execution. Logs can be viewed to monitor the process and debug if necessary.
Example
from basketball_reference_webscrapper.data_models.feature_model import FeatureIn
from basketball_reference_webscrapper.webscrap_basketball_reference import WebScrapBasketballReference
# Define the feature object
feature_object = FeatureIn(
url='https://www.basketball-reference.com',
season=2023,
data_type='gamelog',
team='BOS' # Example team abbreviation for Boston Celtics
)
# Create the scraper instance
scraper = WebScrapBasketballReference(feature_object)
# Scrape the data
data = scraper.webscrappe_nba_games_data()
# Display the data
print(data)
Contributing
Contributions are welcome! Please submit a pull request or create an issue to report bugs or request features.
License
Contact
For any questions or feedback, please contact [yannick.flores1992@gmail.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for basketball_reference_webscrapper-0.4.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 976d2f62c782b36d3823287cec4675d9ade76f973a3d6f15d02e6fafb0ab8d4b |
|
MD5 | 3239af8118e7ef0f07876ff10716cdcf |
|
BLAKE2b-256 | 263cc3db1f4d3fec04121414d731609b39a4917f4134111f0eac76edf1104c13 |
Hashes for basketball_reference_webscrapper-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a64eb1470ad8a0ff3b2831140fbcfe58a2bbf95d3960557e7673693f6977facb |
|
MD5 | 6cd1701792e49d5e75d19bb2084c6922 |
|
BLAKE2b-256 | f6fabca2ad00ea5ce1517a803ef7bbe64e85c57a8fa6b07c05ff4522c2cefb38 |