Skip to main content

baskRef is a tool to scrape basketball Data from the web.

Project description

BaskRef (Basketball Scraper)

BaskRef is a tool to scrape basketball Data from the web.

The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.

About the Package

What data are we collecting?

  • games & game stats (in depth stats of the games)
  • players game stats

All datasets are available to be collected:

  • by day (all games in one day)
  • by whole season (regular + playoffs)
  • by playoffs

Future Collections (Not yet implemented)

  • players meta data (Not Implemented)
  • game logs (Not Implemented)

How to Install & Run the Package?

Install the project

pip install baskref

# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR

Scrape Game Data

Scrape all games for the 7th of January 2022.

baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets

Scrape all games for the 2006 NBA season (regular season + playoffs).

baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets

Scrape all games for the 2006 NBA playoffs.

baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets

Scrape Game URLs only

# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets

Scrape Player Stats Data

# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets

Scrape Using a Proxy

Use proxy for scraping.

baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com

How to Use the Package?

Install requirements

pip install -r requirements.txt

Data Collection Utility

This refers to the scraping functionalities.

For any mode of collection first you need to import and initialize the below classes.

from baskref.data_collection import (
    BaskRefUrlScraper,
    BaskRefDataScraper,
)

url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()

# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")

The BaskRefDataScraper.get_games_data returns a list of dictionaries.

Collect games for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific season (regular + playoffs)

game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific postseason

game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect player stats for for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)

Data Saving Package

This refers to the saving of the data.

Save a list of dictionaries to a CSV file.

import os
from baskref.data_saving.file_saver import save_file_from_list

save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)

How to Run Tests?

Run all tests with Pytest

pytest

Run all tests with coverage

coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty

Code Formating

The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.

# run black over the entire code base
black .

Linting

The code base uses pylint and mypy for code linting.

Pylint

the configuration for pylint is stored in .pylintrc file.

# run pylint over the entire code base
pylint --recursive=y ./

MyPy

the configuration for mypy is stored in pyproject.toml file.

# run mypy over the entire code base
mypy baskref

Bonus

Prepare project for development

  1. Create Virtual Environment
  • You might want to use a virtual environment for executing the project.
  • this is an optional step (if skipping go straight to step 2)

Create a new virtual environemnt

python -m venv venv  # The second parameter is a path to the virtual env.

Activate the new virtual environment

# Windows
.\venv\Scripts\activate

# Unix
source venv/bin/activate

Leaving the virtual environment

deactivate
  1. Install all the dev requirements
pip install -r requirements_dev.txt

# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins

# uninstall all packages linux
pip freeze | xargs pip uninstall -y
  1. Install the pre-commit hook
pre-commit install

Prepare a new Version

This section describes some of the steps when preparing a new baskref version.

  • adjust the pyproject.toml file
    • version
    • dependencies
  • install project locally and test it
python -m build
pip install .
  • publish project to test.pypi
pip install --upgrade twine
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
  • publish a new version
twine upload dist/*

Contributors

  1. Dominik Zulovec Sajovic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baskref-0.0.5.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

baskref-0.0.5-py3-none-any.whl (17.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page