Skip to main content

baskRef is a tool to scrape basketball Data from the web.

Project description

BaskRef (Basketball Scraper)

BaskRef is a tool to scrape basketball Data from the web.

The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.

About the Package

What data are we collecting?

  • games & game stats (in depth stats of the games)
  • players game stats

All datasets are available to be collected:

  • by day (all games in one day)
  • by whole season (regular + playoffs)
  • by playoffs

Future Collections (Not yet implemented)

  • players meta data (Not Implemented)
  • game logs (Not Implemented)

How to Install & Run the Package?

Install the project

pip install baskref

# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR

Scrape Game Data

Scrape all games for the 7th of January 2022.

baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets

Scrape all games for the 2006 NBA season (regular season + playoffs).

baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets

Scrape all games for the 2006 NBA playoffs.

baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets

Scrape Game URLs only

# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets

Scrape Player Stats Data

# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets

Scrape Using a Proxy

Use proxy for scraping.

baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com

How to Use the Package?

Install requirements

pip install -r requirements.txt

Data Collection Utility

This refers to the scraping functionalities.

For any mode of collection first you need to import and initialize the below classes.

from baskref.data_collection import (
    BaskRefUrlScraper,
    BaskRefDataScraper,
)

url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()

# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")

The BaskRefDataScraper.get_games_data returns a list of dictionaries.

Collect games for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific season (regular + playoffs)

game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect games for a specific postseason

game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)

Collect player stats for for a specific day

from datetime import date

game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)

Data Saving Package

This refers to the saving of the data.

Save a list of dictionaries to a CSV file.

import os
from baskref.data_saving.file_saver import save_file_from_list

save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)

How to Run Tests?

Run all tests with Pytest

pytest

Run all tests with coverage

coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty

Code Formating

The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.

# run black over the entire code base
black .

Linting

The code base uses pylint and mypy for code linting.

Pylint

the configuration for pylint is stored in .pylintrc file.

# run pylint over the entire code base
pylint --recursive=y ./

MyPy

the configuration for mypy is stored in pyproject.toml file.

# run mypy over the entire code base
mypy baskref

Bonus

Prepare project for development

  1. Create Virtual Environment
  • You might want to use a virtual environment for executing the project.
  • this is an optional step (if skipping go straight to step 2)

Create a new virtual environemnt

python -m venv venv  # The second parameter is a path to the virtual env.

Activate the new virtual environment

# Windows
.\venv\Scripts\activate

# Unix
source venv/bin/activate

Leaving the virtual environment

deactivate
  1. Install all the dev requirements
pip install -r requirements_dev.txt

# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins

# uninstall all packages linux
pip freeze | xargs pip uninstall -y
  1. Install the pre-commit hook
pre-commit install

Prepare a new Version

This section describes some of the steps when preparing a new baskref version.

  • adjust the pyproject.toml file
    • version
    • dependencies
  • install project locally and test it
python -m build
pip install .
  • publish project to test.pypi
pip install --upgrade twine
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
  • publish a new version
twine upload dist/*

Contributors

  1. Dominik Zulovec Sajovic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baskref-0.0.7.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baskref-0.0.7-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file baskref-0.0.7.tar.gz.

File metadata

  • Download URL: baskref-0.0.7.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-0.0.7.tar.gz
Algorithm Hash digest
SHA256 851f77072b493c46d25a6d7a00097dc3e9db68718d577e755ab7d9ef1267952f
MD5 6d4d5feb6944723bada88d30c18df05d
BLAKE2b-256 7cbc83aa53e55e6c1431b7f3348dda1d7ffce0470bf6380cc5b341b4198f93d9

See more details on using hashes here.

File details

Details for the file baskref-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: baskref-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for baskref-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 77a9916ead286c056d7aba9748f8b12a564b22c868d7faff931726f95d35e30f
MD5 564c254ba6a6eb921653e1d2d2817d18
BLAKE2b-256 ed84e683ce82abf63bba345d2695abc6b47e06e463149e20cb27f6a3f4bec964

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page