baskRef is a tool to scrape basketball Data from the web.
Project description
BaskRef (Basketball Scraper)
BaskRef is a tool to scrape basketball Data from the web.
The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.
About the Package
What data are we collecting?
- games & game stats (in depth stats of the games)
- players game stats
All datasets are available to be collected:
- by day (all games in one day)
- by whole season (regular + playoffs)
- by playoffs
Future Collections (Not yet implemented)
- players meta data (Not Implemented)
- game logs (Not Implemented)
How to Install & Run the Package?
Install the project
pip install baskref
# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR
Scrape Game Data
Scrape all games for the 7th of January 2022.
baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets
Scrape all games for the 2006 NBA season (regular season + playoffs).
baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets
Scrape all games for the 2006 NBA playoffs.
baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets
Scrape Game URLs only
# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets
Scrape Player Stats Data
# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets
Scrape Using a Proxy
Use proxy for scraping.
baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com
How to Use the Package?
Install requirements
pip install -r requirements.txt
Data Collection Utility
This refers to the scraping functionalities.
For any mode of collection first you need to import and initialize the below classes.
from baskref.data_collection import (
BaskRefUrlScraper,
BaskRefDataScraper,
)
url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()
# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")
The BaskRefDataScraper.get_games_data returns a list of dictionaries.
Collect games for a specific day
from datetime import date
game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)
Collect games for a specific season (regular + playoffs)
game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)
Collect games for a specific postseason
game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)
Collect player stats for for a specific day
from datetime import date
game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)
Data Saving Package
This refers to the saving of the data.
Save a list of dictionaries to a CSV file.
import os
from baskref.data_saving.file_saver import save_file_from_list
save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)
How to Run Tests?
Run all tests with Pytest
pytest
Run all tests with coverage
coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty
Code Formating
The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.
# run black over the entire code base
black .
Linting
The code base uses pylint and mypy for code linting.
Pylint
the configuration for pylint is stored in .pylintrc file.
# run pylint over the entire code base
pylint --recursive=y ./
MyPy
the configuration for mypy is stored in pyproject.toml file.
# run mypy over the entire code base
mypy baskref
Bonus
Prepare project for development
- Create Virtual Environment
- You might want to use a virtual environment for executing the project.
- this is an optional step (if skipping go straight to step 2)
Create a new virtual environemnt
python -m venv venv # The second parameter is a path to the virtual env.
Activate the new virtual environment
# Windows
.\venv\Scripts\activate
# Unix
source venv/bin/activate
Leaving the virtual environment
deactivate
- Install all the dev requirements
pip install -r requirements_dev.txt
# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins
# uninstall all packages linux
pip freeze | xargs pip uninstall -y
- Install the pre-commit hook
pre-commit install
Prepare a new Version
This section describes some of the steps when preparing a new baskref version.
- adjust the pyproject.toml file
- version
- dependencies
- install project locally and test it
python -m build
pip install .
- publish project to test.pypi
pip install --upgrade twine
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
- publish a new version
twine upload dist/*
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file baskref-0.0.9.tar.gz.
File metadata
- Download URL: baskref-0.0.9.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21683d30fd43b03a6d002d3b58175638ee5b838356ff595df9cd87da89e30e3e
|
|
| MD5 |
fa71f4022baa83655a07d62238fcc445
|
|
| BLAKE2b-256 |
4d61f208b7467c8393c8c9360cd15ae75d6a3b913ed2ad5083591a1548fe8964
|
File details
Details for the file baskref-0.0.9-py3-none-any.whl.
File metadata
- Download URL: baskref-0.0.9-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09bdbd5172ffb1fcef90bb653a40e1cc69a06aa57cfd8c96bf58e7f49b9033c5
|
|
| MD5 |
d7dc19bc3436359bda0bce71181ff722
|
|
| BLAKE2b-256 |
51c6ac1fbecdeca33d3c90be3a48c4a7e2942e1bb7b941a42fc00f5906eccd04
|