Skip to main content

Retrieve Sports data in Python

Project description

Table of Contents generated with DocToc

sportsdataverse-py

Lifecycle:experimental PyPIPyPI - Down
loads Contributors Twitter Follow

See CHANGELOG.md for details.

The goal of sportsdataverse-py is to provide the community with a python package for working with sports data as a companion to the cfbfastR, hoopR, and wehoop R packages. Beyond data aggregation and tidying ease, one of the multitude of services that sportsdataverse-py provides is for benchmarking open-source expected points and win probability metrics for American Football.

Supported leagues and data sources

League Module Surfaces covered
NBA sportsdataverse.nba ESPN (Site v2 + Web v3 + Core v2) — 118 wrappers
WNBA sportsdataverse.wnba ESPN — 124 wrappers
MBB (NCAA M) sportsdataverse.mbb ESPN + NCAA-only (bracketology, rankings, recruits) — 121 wrappers
WBB (NCAA W) sportsdataverse.wbb ESPN + NCAA-only — 126 wrappers
CFB sportsdataverse.cfb ESPN + NCAA + football-only (QBR) — 123 wrappers
NFL sportsdataverse.nfl ESPN + football-only (QBR) — 119 wrappers
MLB sportsdataverse.mlb ESPN + MLB Stats API (statsapi.mlb.com) + Baseball Savant / Statcast — 175 wrappers
NHL sportsdataverse.nhl api-web.nhle.com/v1/ (game-feed) + NHL EDGE (player tracking) + Stats REST + Records site — 132 wrappers
Total ~1,030 wrappers

Polars / pandas parser layer

Every wrapper returns raw Dict by default. A parser layer turns those payloads into tidy polars (or pandas) DataFrames.

For ESPN cross-league wrappers, pass return_parsed=True to get a DataFrame directly — the raw-Dict contract is unchanged when the kwarg is omitted, so existing callers are unaffected:

from sportsdataverse.nba import espn_nba_team_roster

raw = espn_nba_team_roster(team_id=13)                          # → Dict (default)
df  = espn_nba_team_roster(team_id=13, return_parsed=True)      # → polars
pdf = espn_nba_team_roster(team_id=13,
                            return_parsed=True,
                            return_as_pandas=True)              # → pandas

For the NHL and MLB sibling-API wrappers, compose the wrapper with its parser:

from sportsdataverse.nhl import nhl_web_pbp, parse_nhl_web_pbp
df = parse_nhl_web_pbp(nhl_web_pbp(2023030417))                 # 331-row polars frame

See py.sportsdataverse.org/docs/architecture/espn-cross-league and py.sportsdataverse.org/docs/parsers/index for the full architecture + parser registry.

Installation

The package metadata lives entirely in pyproject.toml (PEP 621 [project] table). There is no setup.py source-of-truth.

Standard install (pip)

pip install sportsdataverse

With optional extras (defined in [project.optional-dependencies] in pyproject.toml):

pip install "sportsdataverse[all]"      # everything below
pip install "sportsdataverse[models]"   # extra deps for the EPA / WP model code
pip install "sportsdataverse[tests]"    # adds pytest, mypy, ruff, etc.
pip install "sportsdataverse[docs]"     # adds sphinx + sphinx-markdown-builder for the doc build

Modern install (uv — recommended)

uv is the fast, drop-in package manager we use day to day.

# Add to a uv-managed project:
uv add sportsdataverse

# With extras:
uv add "sportsdataverse[all]"

# Or install the latest dev snapshot from GitHub:
uv add "sportsdataverse @ git+https://github.com/sportsdataverse/sportsdataverse-py"

Conda install

Once the conda-forge feedstock is published the package is also available via:

conda install -c conda-forge sportsdataverse
# or
mamba install -c conda-forge sportsdataverse

Until then, conda users can build a local package from this repo:

conda install conda-build conda-verify
conda build recipe/
conda install --use-local sportsdataverse

See recipe/README.md for the full conda workflow.

Development install

For contributing or running the test suite:

git clone https://github.com/sportsdataverse/sportsdataverse-py.git
cd sportsdataverse-py

# uv (recommended) — fully resolved editable install with every extra:
uv pip install -e ".[all]"

# Plain pip works too if uv isn't available:
pip install -e ".[all]"

Note: once we add a PEP 735 [dependency-groups] block (currently the repo only ships PEP 621 [project.optional-dependencies]), uv sync --all-extras --all-groups will become the one-shot dev incantation. Until then, uv pip install -e ".[all]" is the equivalent path.

Run the test suite:

uv run pytest                       # offline tests only
SDV_PY_LIVE_TESTS=1 uv run pytest   # include live API tests (slower; hits ESPN / nflverse)

For deeper dev-environment detail (lint, mypy, dep-bumping workflow), see CONTRIBUTING.md.

Notes

  • Python target: 3.9–3.14.
  • DataFrame engine: polars 1.x. Most loaders accept return_as_pandas=True if you prefer pandas.
  • NFL caching: loaders cache to memory by default. Set SDV_PY_NFL_CACHE=filesystem for cross-session reuse, or SDV_PY_NFL_CACHE=off to disable. See sportsdataverse.nfl.config.update_config() for runtime control.

Examples and tutorials

Every public function ships a runnable Example: block in its docstring showing a quick-start call, common parameter combinations, and a one-line pipeline next-step. Render the API reference locally with bash create_docs.sh or browse the live docs at py.sportsdataverse.org.

For longer-form walkthroughs, see the intro/intermediate Jupyter notebooks under examples/notebooks/:

Notebook Covers
01_quickstart.ipynb Cross-sport intro — package layout, polars vs pandas, the download() retry layer
02_cfb_intro.ipynb College football PBP, schedule, teams, espn_cfb_play_participants
03_nfl_intro.ipynb NFL — nflreadpy parity surface, caching layer, current-season helpers
04_nba_intro.ipynb NBA — PBP, schedule, teams, game rosters, shot distribution
05_wbb_wnba_intro.ipynb Women's basketball — NCAA + WNBA parallels, multi-table stats
06_mbb_intro.ipynb Men's college basketball — PBP, schedule, conference standings
07_nhl_intro.ipynb NHL — PBP, schedule, teams, shot-event filter

Companion packages

sportsdataverse-py is one corner of the broader SportsDataverse ecosystem. The R sister packages cover the same data sources with deeper sport-specific coverage:

The NFL submodule is a near drop-in replacement for nflreadpy; the broader nflverse ecosystem is the upstream data source for many of those loaders.

Our Authors

Citations

To cite the sportsdataverse-py Python package in publications, use:

BibTex Citation

@misc{gilani_sdvpy_2021,
  author = {Gilani, Saiem},
  title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
  url = {https://py.sportsdataverse.org},
  season = {2021}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sportsdataverse-0.0.52.tar.gz (7.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sportsdataverse-0.0.52-py3-none-any.whl (7.9 MB view details)

Uploaded Python 3

File details

Details for the file sportsdataverse-0.0.52.tar.gz.

File metadata

  • Download URL: sportsdataverse-0.0.52.tar.gz
  • Upload date:
  • Size: 7.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sportsdataverse-0.0.52.tar.gz
Algorithm Hash digest
SHA256 5be5a50758a96533f94dfbeecdc8c71b472407c88f358626d52d642e6dee8b49
MD5 8ce09506b12bca199717ab3497843d3f
BLAKE2b-256 4470c9a7794a3068587d4ad31fecf8569a9c83630d316ec30f51a4f393f1dbef

See more details on using hashes here.

Provenance

The following attestation bundles were made for sportsdataverse-0.0.52.tar.gz:

Publisher: python-publish.yml on sportsdataverse/sportsdataverse-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sportsdataverse-0.0.52-py3-none-any.whl.

File metadata

File hashes

Hashes for sportsdataverse-0.0.52-py3-none-any.whl
Algorithm Hash digest
SHA256 ba28c737ee74756fda49855b0848a5a260f91f030e4ba2b1cdd8768276eb2cc4
MD5 7fe8f6a6718c024fa0588b7ab0f29c12
BLAKE2b-256 28dfb05406d4b81aac28feb1f4b6b9ebb3d99e939c67c72efdf607d739492bd2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sportsdataverse-0.0.52-py3-none-any.whl:

Publisher: python-publish.yml on sportsdataverse/sportsdataverse-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page