Skip to main content

A Python package for scraping & analyzing sports statistics

Project description

chickenstats

Hero image - scatter plot with drumsticks and tooltips

PyPI - Version PyPI - Python Version tests codecov GitHub Release Date - Published_At GitHub License


About

  • Scrape & manipulate data from various NHL endpoints, leveraging chickenstats.chicken_nhl, which includes an open-source xG model for shot quality metrics
  • Augment play-by-play data & generate custom aggregations from raw csv files downloaded from Evolving-Hockey (subscription required) with chickenstats.evolving_hockey

For more in-depth explanations, tutorials, & detailed reference materials, consult the Documentation.


Compatibility

chickenstats requires Python 3.10 or greater & runs on the latest stable versions of Linux, Mac, & Windows operating systems.


Installation

Very simple - install using PyPi. Best practice is to develop in an isolated virtual environment (conda or otherwise), but who's a chicken to judge?

pip install chickenstats

To confirm installation & the latest version (1.8.0):

pip show chickenstats

Usage

chickenstats is structured as two underlying modules, each used with different data sources:

  • chickenstats.chicken_nhl
  • chickenstats.evolving_hockey

The packages and resulting outputs are largely interchangeable, with similar fields across chicken_nhl and evolving_hockey packages, including high-danger scoring chances, score- and venue-adjusted fenwick, corsi, and xG.

Feel free to use whichever package and data source that you prefer. If you have questions about differences between packages, you can find me on Bluesky at @chickenandstats.com or email me at chicken@chickenandstats.com.

Please note that chickenstats is under active development - features will continue to be added or modified over time.

chicken_nhl

chickenstats.chicken_nhl allows you to scrape play-by-play data and aggregate individual, line, and team statistics. After importing the module, scrape the schedule for game IDs, then play-by-play data for your team of choice:

from chickenstats.chicken_nhl import Season, Scraper

season = Season(2024)

schedule = season.schedule("NSH")
game_ids = schedule.loc[schedule.game_state == "OFF"].game_id.tolist()

scraper = Scraper(game_ids)

play_by_play = scraper.play_by_play

You can then aggregate the play-by-play data for individual and on-ice statistics with one line of code:

stats = scraper.stats

It's very easy to introduce additional detail to, as well as change the level of, aggregations, including for season-level statistics accounting for teammates on-ice:

scraper.prep_stats(level="season", teammates=True)
stats = scraper.stats

[!TIP] The Scraper object saves the prior aggregation to the scraper.stats attribute, so it needs to be reset. Then the attribute can be re-called, with a different level of aggregation

There is similar functionality for forward line / defensive pairing stats:

scraper.prep_lines(position="f")
forward_lines = scraper.lines

scraper.prep_lines(position="d", level="season")
defense_lines = scraper.lines

[!TIP] This step isn't strictly necessary for the forwards - they're the default line aggregation. Provide "d" instead of "f" for defensive line stats

As well as for team stats:

team_stats = scraper.team_stats

For additional information on usage and functionality, consult the relevant user guide

evolving_hockey

The chickenstats.evolving_hockey module manipulates raw csv files downloaded from Evolving-Hockey. Using their original shifts & play-by-play data, users can add additional information & aggregate for individual & on-ice statistics, including high-danger shooting events, xG & adjusted xG, faceoffs, & changes.

First, prep a play-by-play dataframe using raw play-by-play and shifts CSV files from the Evolving-Hockey website:

import pandas as pd
from chickenstats.evolving_hockey import prep_pbp, prep_stats, prep_lines

raw_shifts = pd.read_csv('./raw_shifts.csv')
raw_pbp = pd.read_csv('./raw_pbp.csv')

play_by_play = prep_pbp(raw_pbp, raw_shifts)

You can use the play_by_play dataframe in various aggregations. This will return individual game statistics, including on-ice (e.g., GF, xGF) & usage (i.e., zone starts), accounting for teammates & opposition on-ice:

individual_game = prep_stats(play_by_play, level='game', teammates=True, opposition=True)

This will return game statistics for forward-line combinations, accounting for opponents on-ice:

forward_lines = prep_lines(play_by_play, level='game', position='f', opposition=True)

For additional information on usage and functionality, consult the relevant user guide


Help

If you need help with any aspect of chickenstats, from installation to usage, please don't hesitate to reach out! You can find me on Bluesky at @chickenandstats.com or email me at chicken@chickenandstats.com.

Please report any bugs or issues via the chickenstats issues page, where you can also post feature requests. Before doing so, please check the roadmap, there might already be plans to include your request.


Acknowledgements

chickenstats wouldn't be possible without the support & efforts of countless others. I am obviously extremely grateful, even if there are too many of you to thank individually. However, this chicken will do his best.

First & foremost is my wife - the lovely Mrs. Chicken has been patient, understanding, & supportive throughout the countless hours of development, sometimes to her detriment.

Sincere apologies to the friends & family that have put up with me since my entry into Python, programming, & data analysis in January 2021. Thank you for being excited for me & with me throughout all of this, especially when you've had to fake it...

Thank you to the hockey analytics community on (the artist formerly known as) Twitter. You're producing & reacting to cutting-edge statistical analyses, while providing a supportive, welcoming environment for newcomers. Thank y'all for everything that you do. This is by no means exhaustive, but there are a few people worth calling out specifically:

I'm also grateful to the thriving community of Python educators & open-source contributors on Twitter. Thank y'all for your knowledge & practical advice. Matt Harrison (@mharrison) deserves a special mention for his books on Pandas and XGBoost, both of which are available at his online store. Again, not exhaustive, but others worth thanking individually:

Finally, this library depends on a host of other open-source packages. chickenstats is possible because of the efforts of thousands of individuals, represented below:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chickenstats-1.7.9.26.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chickenstats-1.7.9.26-py3-none-any.whl (988.0 kB view details)

Uploaded Python 3

File details

Details for the file chickenstats-1.7.9.26.tar.gz.

File metadata

  • Download URL: chickenstats-1.7.9.26.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chickenstats-1.7.9.26.tar.gz
Algorithm Hash digest
SHA256 58c78a43f143904b592e1020c451fc96a036aa94077ca72ac3a3b7b784135ad8
MD5 d426939493c0deb4ef3e4df911a0d4a3
BLAKE2b-256 7cf57ace309a54f5951c86d453a472e13711d5ececc57e89adfaf0218be7ea33

See more details on using hashes here.

File details

Details for the file chickenstats-1.7.9.26-py3-none-any.whl.

File metadata

File hashes

Hashes for chickenstats-1.7.9.26-py3-none-any.whl
Algorithm Hash digest
SHA256 ec77d6f72c79ecf0eeed07a44b5a1a69cff2f3225361f0f439e684b1bad17ee2
MD5 8187c3ed9b09cc95345d359d2dfd0b72
BLAKE2b-256 3d867e19559aa2646837d489c6e4b5e9c91f2a0da688016e4c5a2ad0026944ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page