Skip to main content

baskRef is a tool to scrape basketball Data from the web.

Project description

BaskRef (Basketball Scraper)

BaskRef is a tool to scrape basketball Data from the web.

The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.

About the Package

What data are we collecting?

  • games & game stats (in depth stats of the games)
  • players game stats

All datasets are available to be collected:

  • by day (all games in one day)
  • by whole season (regular + playoffs)
  • by playoffs

Future Collections (Not yet implemented)

  • players meta data (Not Implemented)
  • game logs (Not Implemented)

How to Install & Run the Package?

Install the project

pip install baskref

# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR

Scrape Game Data

Scrape all games for the 7th of January 2022.

baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets

Scrape all games for the 2006 NBA season (regular season + playoffs).

baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets

Scrape all games for the 2006 NBA playoffs.

baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets

Scrape Game URLs only

# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets

Scrape Player Stats Data

# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets

Scrape Using a Proxy

Use proxy for scraping.

baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com

How to Use the Package?

Install requirements

pip install -r requirements.txt

Data Collection Utility

This refers to the scraping functionalities.

For any mode of collection first you need to import and initialize the below classes.

from baskref.data_collection import (
    BaskRefUrlScraper,
    BaskRefDataScraper,
)

url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()

# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")

The BaskRefDataScraper.get_games_data returns a list of dictionaries.

Collect games for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific season (regular + playoffs)

game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific postseason

game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect player stats for for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)

Data Saving Package

This refers to the saving of the data.

Save a list of dictionaries to a CSV file.

import os
from baskref.data_saving.file_saver import save_file_from_list

save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)

How to Run Tests?

Run all tests with Pytest

pytest

Run all tests with coverage

coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty

Code Formating

The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.

# run black over the entire code base
black .

Linting

The code base uses pylint and mypy for code linting.

Pylint

the configuration for pylint is stored in .pylintrc file.

# run pylint over the entire code base
pylint --recursive=y ./

MyPy

the configuration for mypy is stored in pyproject.toml file.

# run mypy over the entire code base
mypy baskref

Bonus

Prepare project for development

  1. Create Virtual Environment
  • You might want to use a virtual environment for executing the project.
  • this is an optional step (if skipping go straight to step 2)

Create a new virtual environemnt

python -m venv venv  # The second parameter is a path to the virtual env.

Activate the new virtual environment

# Windows
.\venv\Scripts\activate

# Unix
source venv/bin/activate

Leaving the virtual environment

deactivate
  1. Install all the dev requirements
pip install -r requirements_dev.txt

# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins

# uninstall all packages linux
pip freeze | xargs pip uninstall -y
  1. Install the pre-commit hook
pre-commit install

Prepare a new Version

This section describes some of the steps when preparing a new baskref version.

  • empty the dist folder
rm -rf dist/*
  • adjust the pyproject.toml file

    • version
    • dependencies
  • install project locally and test it

python -m build
pip install .
  • install twine
pip install --upgrade twine
  • publish project to test.pypi (optional)
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
  • publish a new version
twine upload dist/*

Contributors

  1. Dominik Zulovec Sajovic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baskref-1.0.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baskref-1.0.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file baskref-1.0.0.tar.gz.

File metadata

  • Download URL: baskref-1.0.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-1.0.0.tar.gz
Algorithm Hash digest
SHA256 43076bab53c186e48ef8e8e6e92dc899202d79e824d1184209fbddb4d369b9e1
MD5 0d86d1d3e0989bc4a41778dc8851da88
BLAKE2b-256 cd610be6dede00f7fe1c323054932a3078690b2e582315f0dba8332e79bab175

See more details on using hashes here.

File details

Details for the file baskref-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: baskref-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 772578ed6db70172acf0d8a3774fee205b73c6d9ab1f40e54a65587d93f168d1
MD5 30de52e9e20cff6495db07cf105375c2
BLAKE2b-256 2e41403b98c1e8fa21e06c9073dc933b775fdaeb9b0f47ae7e85ba22938e6228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page