Retrieve Sports data in Python
Project description
Table of Contents generated with DocToc
sportsdataverse-py 
See CHANGELOG.md for details.
The goal of sportsdataverse-py is to provide the community with a python package for working with sports data as a companion to the cfbfastR, hoopR, and wehoop R packages. Beyond data aggregation and tidying ease, one of the multitude of services that sportsdataverse-py provides is for benchmarking open-source expected points and win probability metrics for American Football.
Supported leagues and data sources
| League | Module | Surfaces covered |
|---|---|---|
| NBA | sportsdataverse.nba |
ESPN (Site v2 + Web v3 + Core v2) — 118 wrappers |
| WNBA | sportsdataverse.wnba |
ESPN — 124 wrappers |
| MBB (NCAA M) | sportsdataverse.mbb |
ESPN + NCAA-only (bracketology, rankings, recruits) — 121 wrappers |
| WBB (NCAA W) | sportsdataverse.wbb |
ESPN + NCAA-only — 126 wrappers |
| CFB | sportsdataverse.cfb |
ESPN + NCAA + football-only (QBR) — 123 wrappers |
| NFL | sportsdataverse.nfl |
ESPN + football-only (QBR) — 119 wrappers |
| MLB | sportsdataverse.mlb |
ESPN + MLB Stats API (statsapi.mlb.com) + Baseball Savant / Statcast — 175 wrappers |
| NHL | sportsdataverse.nhl |
api-web.nhle.com/v1/ (game-feed) + NHL EDGE (player tracking) + Stats REST + Records site — 132 wrappers |
| Total | ~1,030 wrappers |
Polars / pandas parser layer
Every wrapper returns raw Dict by default. A parser layer turns
those payloads into tidy polars (or pandas) DataFrames.
For ESPN cross-league wrappers, pass return_parsed=True to get a
DataFrame directly — the raw-Dict contract is unchanged when the
kwarg is omitted, so existing callers are unaffected:
from sportsdataverse.nba import espn_nba_team_roster
raw = espn_nba_team_roster(team_id=13) # → Dict (default)
df = espn_nba_team_roster(team_id=13, return_parsed=True) # → polars
pdf = espn_nba_team_roster(team_id=13,
return_parsed=True,
return_as_pandas=True) # → pandas
For the NHL and MLB sibling-API wrappers, compose the wrapper with its parser:
from sportsdataverse.nhl import nhl_web_pbp, parse_nhl_web_pbp
df = parse_nhl_web_pbp(nhl_web_pbp(2023030417)) # 331-row polars frame
See py.sportsdataverse.org/docs/architecture/espn-cross-league and py.sportsdataverse.org/docs/parsers/index for the full architecture + parser registry.
Installation
The package metadata lives entirely in pyproject.toml
(PEP 621 [project] table). There is no setup.py source-of-truth.
Standard install (pip)
pip install sportsdataverse
With optional extras (defined in [project.optional-dependencies] in
pyproject.toml):
pip install "sportsdataverse[all]" # everything below
pip install "sportsdataverse[models]" # extra deps for the EPA / WP model code
pip install "sportsdataverse[tests]" # adds pytest, mypy, ruff, etc.
pip install "sportsdataverse[docs]" # adds sphinx + sphinx-markdown-builder for the doc build
Modern install (uv — recommended)
uv is the fast, drop-in package manager we use day to day.
# Add to a uv-managed project:
uv add sportsdataverse
# With extras:
uv add "sportsdataverse[all]"
# Or install the latest dev snapshot from GitHub:
uv add "sportsdataverse @ git+https://github.com/sportsdataverse/sportsdataverse-py"
Conda install
Once the conda-forge feedstock is published the package is also available via:
conda install -c conda-forge sportsdataverse
# or
mamba install -c conda-forge sportsdataverse
Until then, conda users can build a local package from this repo:
conda install conda-build conda-verify
conda build recipe/
conda install --use-local sportsdataverse
See recipe/README.md for the full conda workflow.
Development install
For contributing or running the test suite:
git clone https://github.com/sportsdataverse/sportsdataverse-py.git
cd sportsdataverse-py
# uv (recommended) — fully resolved editable install with every extra:
uv pip install -e ".[all]"
# Plain pip works too if uv isn't available:
pip install -e ".[all]"
Note: once we add a PEP 735
[dependency-groups]block (currently the repo only ships PEP 621[project.optional-dependencies]),uv sync --all-extras --all-groupswill become the one-shot dev incantation. Until then,uv pip install -e ".[all]"is the equivalent path.
Run the test suite:
uv run pytest # offline tests only
SDV_PY_LIVE_TESTS=1 uv run pytest # include live API tests (slower; hits ESPN / nflverse)
For deeper dev-environment detail (lint, mypy, dep-bumping workflow), see CONTRIBUTING.md.
Notes
- Python target: 3.9–3.14.
- DataFrame engine: polars 1.x. Most loaders accept
return_as_pandas=Trueif you prefer pandas. - NFL caching: loaders cache to memory by default. Set
SDV_PY_NFL_CACHE=filesystemfor cross-session reuse, orSDV_PY_NFL_CACHE=offto disable. Seesportsdataverse.nfl.config.update_config()for runtime control.
Examples and tutorials
Every public function ships a runnable Example: block in its docstring
showing a quick-start call, common parameter combinations, and a one-line
pipeline next-step. Render the API reference locally with
bash create_docs.sh or browse the live docs at
py.sportsdataverse.org.
For longer-form walkthroughs, see the intro/intermediate Jupyter notebooks
under examples/notebooks/:
| Notebook | Covers |
|---|---|
01_quickstart.ipynb |
Cross-sport intro — package layout, polars vs pandas, the download() retry layer |
02_cfb_intro.ipynb |
College football PBP, schedule, teams, espn_cfb_play_participants |
03_nfl_intro.ipynb |
NFL — nflreadpy parity surface, caching layer, current-season helpers |
04_nba_intro.ipynb |
NBA — PBP, schedule, teams, game rosters, shot distribution |
05_wbb_wnba_intro.ipynb |
Women's basketball — NCAA + WNBA parallels, multi-table stats |
06_mbb_intro.ipynb |
Men's college basketball — PBP, schedule, conference standings |
07_nhl_intro.ipynb |
NHL — PBP, schedule, teams, shot-event filter |
Companion packages
sportsdataverse-py is one corner of the broader SportsDataverse
ecosystem. The R sister packages cover the same data sources with deeper
sport-specific coverage:
- wehoop — women's basketball (WNBA + NCAA)
- hoopR — men's basketball (NBA + NCAA)
- cfbfastR — college football
- baseballr — baseball (MLB + MiLB + NCAA)
- fastRhockey — hockey (NHL + WHL)
The NFL submodule is a near drop-in replacement for nflreadpy; the broader nflverse ecosystem is the upstream data source for many of those loaders.
Our Authors
Citations
To cite the sportsdataverse-py Python package in publications, use:
BibTex Citation
@misc{gilani_sdvpy_2021,
author = {Gilani, Saiem},
title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
url = {https://py.sportsdataverse.org},
season = {2021}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sportsdataverse-0.0.51.tar.gz.
File metadata
- Download URL: sportsdataverse-0.0.51.tar.gz
- Upload date:
- Size: 7.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b94b549593cdc04eae47fa18b5ff0752cdbf3e080e67db862395d8360cc27213
|
|
| MD5 |
dc740c9752be4430d24b703d140e6122
|
|
| BLAKE2b-256 |
d5d1b72ae83979242296ea5eb10ba6fbf7da96a32b7c8cb43a251239bbc3413f
|
Provenance
The following attestation bundles were made for sportsdataverse-0.0.51.tar.gz:
Publisher:
python-publish.yml on sportsdataverse/sportsdataverse-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sportsdataverse-0.0.51.tar.gz -
Subject digest:
b94b549593cdc04eae47fa18b5ff0752cdbf3e080e67db862395d8360cc27213 - Sigstore transparency entry: 1675387763
- Sigstore integration time:
-
Permalink:
sportsdataverse/sportsdataverse-py@5a2864e12bf20c109d9388cbd4606987a231b448 -
Branch / Tag:
refs/tags/0.0.51 - Owner: https://github.com/sportsdataverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5a2864e12bf20c109d9388cbd4606987a231b448 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sportsdataverse-0.0.51-py3-none-any.whl.
File metadata
- Download URL: sportsdataverse-0.0.51-py3-none-any.whl
- Upload date:
- Size: 7.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e533e1b00ced8573c13d7197b781a63aa558c38c9b39d94a7168d410a16983f
|
|
| MD5 |
c27164ef81f2a8bf382286344a355c56
|
|
| BLAKE2b-256 |
4d3f58a7c918626bba11a620147b8a2c5ddd4e8b219b2103b66d846a0d90cfed
|
Provenance
The following attestation bundles were made for sportsdataverse-0.0.51-py3-none-any.whl:
Publisher:
python-publish.yml on sportsdataverse/sportsdataverse-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sportsdataverse-0.0.51-py3-none-any.whl -
Subject digest:
2e533e1b00ced8573c13d7197b781a63aa558c38c9b39d94a7168d410a16983f - Sigstore transparency entry: 1675387780
- Sigstore integration time:
-
Permalink:
sportsdataverse/sportsdataverse-py@5a2864e12bf20c109d9388cbd4606987a231b448 -
Branch / Tag:
refs/tags/0.0.51 - Owner: https://github.com/sportsdataverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5a2864e12bf20c109d9388cbd4606987a231b448 -
Trigger Event:
release
-
Statement type: