Skip to main content

baskRef is a tool to scrape basketball Data from the web.

Project description

BaskRef (Basketball Scraper)

BaskRef is a tool to scrape basketball Data from the web.

The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.

About the Package

What data are we collecting?

  • games & game stats (in depth stats of the games)
  • players game stats

All datasets are available to be collected:

  • by day (all games in one day)
  • by whole season (regular + playoffs)
  • by playoffs

Future Collections (Not yet implemented)

  • players meta data (Not Implemented)
  • game logs (Not Implemented)

How to Install & Run the Package?

Install the project

pip install baskref

# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR

Scrape Game Data

Scrape all games for the 7th of January 2022.

baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets

Scrape all games for the 2006 NBA season (regular season + playoffs).

baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets

Scrape all games for the 2006 NBA playoffs.

baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets

Scrape Game URLs only

# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets

Scrape Player Stats Data

# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets

Scrape Using a Proxy

Use proxy for scraping.

baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com

How to Use the Package?

Install requirements

pip install -r requirements.txt

Data Collection Utility

This refers to the scraping functionalities.

For any mode of collection first you need to import and initialize the below classes.

from baskref.data_collection import (
    BaskRefUrlScraper,
    BaskRefDataScraper,
)

url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()

# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")

The BaskRefDataScraper.get_games_data returns a list of dictionaries.

Collect games for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific season (regular + playoffs)

game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific postseason

game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect player stats for for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)

Data Saving Package

This refers to the saving of the data.

Save a list of dictionaries to a CSV file.

import os
from baskref.data_saving.file_saver import save_file_from_list

save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)

How to Run Tests?

Run all tests with Pytest

pytest

Run all tests with coverage

coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty

Code Formating

The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.

# run black over the entire code base
black .

Linting

The code base uses pylint and mypy for code linting.

Pylint

the configuration for pylint is stored in .pylintrc file.

# run pylint over the entire code base
pylint --recursive=y ./

MyPy

the configuration for mypy is stored in pyproject.toml file.

# run mypy over the entire code base
mypy baskref

Bonus

Prepare project for development

  1. Create Virtual Environment
  • You might want to use a virtual environment for executing the project.
  • this is an optional step (if skipping go straight to step 2)

Create a new virtual environemnt

python -m venv venv  # The second parameter is a path to the virtual env.

Activate the new virtual environment

# Windows
.\venv\Scripts\activate

# Unix
source venv/bin/activate

Leaving the virtual environment

deactivate
  1. Install all the dev requirements
pip install -r requirements_dev.txt

# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins

# uninstall all packages linux
pip freeze | xargs pip uninstall -y
  1. Install the pre-commit hook
pre-commit install

Prepare a new Version

This section describes some of the steps when preparing a new baskref version.

  • adjust the pyproject.toml file
    • version
    • dependencies
  • install project locally and test it
python -m build
pip install .
  • publish project to test.pypi
pip install --upgrade twine
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
  • publish a new version
twine upload dist/*

Contributors

  1. Dominik Zulovec Sajovic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baskref-0.0.8.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baskref-0.0.8-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file baskref-0.0.8.tar.gz.

File metadata

  • Download URL: baskref-0.0.8.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-0.0.8.tar.gz
Algorithm Hash digest
SHA256 46c51869afd679658b999a072fd8b0e18a6dfdb43dc6290de672738059379d37
MD5 c7f959a975d6f5a912145a2763a56ed6
BLAKE2b-256 3b15879c2175608de6142c1d5bcd9865b535b8c37103d4d7e676ad393348f149

See more details on using hashes here.

File details

Details for the file baskref-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: baskref-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 4d376ea631d9ad6f225c808019192c3bc69fff6872bb753a4b1482f0e6ca7fa1
MD5 be2b33e36e568884bc8b3053152a7633
BLAKE2b-256 e43b554357d0799499327a220314223947d44eaf1ac1367ead8fd3363595cc25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page