A Python package for scraping baseball data.
Project description
pybaseballstats
A Python package for scraping baseball statistics from the web. Inspired by the pybaseball package by James LeDoux.
Available Sources
- Baseball Savant
- This source provides high quality pitch-by-pitch data for all MLB games since 2015 as well as interesting leaderboards for various categories.
- Umpire Scorecards
- This source provides umpire game logs and statistics for all MLB games since 2008.
- Baseball Reference
- This source provides comprehensive, high detail stats for all MLB players and teams since 1871.
- Retrosheet
- This source provides play-by-play data for all MLB games since 1871. This data is primarily used for the player_lookup function as well as ejection data. I am considering adding support for the play by play data as well.
[!NOTE] Although past versions had support for Fangraphs, I have decided to remove support for this source as they have recently implemented very aggressive anti-scraping measures that have made it very difficult to scrape data from their site. I may consider adding support for this source again in the future if they change their anti-scraping measures, but for now I have decided to focus on the other sources that are more reliable and easier to scrape data from.
Installation
pybaseballstats can be installed using pip or any other package manager (I use uv).
Examples:
uv add pybaseballstats
or:
pip install pybaseballstats
Documentation
Usage documentation can be found in this folder. This documentation is a work in progress and will be updated as I add more functionality to the package.
General Documentation (Things of Note)
- This project uses Polars internally. This means that all data returned from functions in this package will be in the form of a Polars DataFrame. If you want to convert the data to a Pandas DataFrame, you can do so by using the
.to_pandas()method on the Polars DataFrame. For example: - The BREF functions use a singleton pattern to guarantee that you won't exceed rate limits and face a longer timeout. So: don't be surprised if when you are making multiple calls to BREF functions that these calls may be a little slower than expected. This is to be expected as the singleton pattern is used to ensure that only one instance of the BREF scraper is created and used throughout the lifetime of your program. This is done to avoid exceeding rate limits and being blocked by BREF.
import pybaseballstats.umpire_scorecards as us
df_polars = us.game_data(start_date="2023-04-01", end_date="2023-04-30")
# Convert to Pandas DataFrame
df_pandas = df_polars.to_pandas()
Contributing
Improvements and bug fixes are welcome! This project follows a branch-based development workflow to keep releases stable and active development fast.
1. Branching Strategy
We use a standard two-branch workflow:
main(release branch)
Heavily protected and contains only code that is currently live on PyPI. Do not push or open pull requests directly againstmain.dev(active development branch)
This is the default branch. All ongoing development, experiments, and bug fixes happen here.- Feature branches
Start new work fromdev(for example:feature/your-feature-name).
2. Local Development & Committing
This project uses just to automate safety checks before code is pushed.
When your changes are ready, run:
just commit "your descriptive commit message"
This command automatically:
- Runs
mypyfor strict type checking. - Runs
pytestwith coverage tracking. - Commits your changes and safely pushes them to your current GitHub branch.
If type checking or tests fail, the commit is automatically aborted so you can fix issues first.
3. Submitting Your Changes
Once your feature or bug fix is complete and tested locally:
- Open a pull request from your feature branch into
dev. - GitHub Actions automatically runs CI (unit tests +
mypy). - After checks pass and review is complete, your changes are merged into
dev.
[!NOTE] Coverage badges in this README reflect the current state of the
devbranch, giving real-time visibility into active development health.
4. Release Pipeline (Maintainers Only)
Releases are automated for security and stability:
- Open a pull request from
devtomain. - Branch protections ensure nothing enters
mainunless all required checks pass. - After merge, run:
just release <version> "Release message"
This performs final validation, tags the release, and pushes it.
A GitHub Action then builds the uv package and deploys to PyPI using Trusted Publishers (tokenless publishing).
Credit and Acknowledgement
This project was directly inspired by the pybaseball package by James LeDoux. The goal of this project is to provide a similar set of functionality with continual updates and improvements, as the original pybaseball package has lagged behind with updates and some key functionality has been broken.
All of the data scraped by this package is publicly available and free to use. All credit for the data goes to the organizations from which it was scraped.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pybaseballstats-0.6.0.tar.gz.
File metadata
- Download URL: pybaseballstats-0.6.0.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5458ac694f5035d17366fabc427670b5ca0c41ffaa05e667f32c3158422cbd19
|
|
| MD5 |
b055f982b026184b7128207ffbc6bd2d
|
|
| BLAKE2b-256 |
4e3f4a7322a8dcf356c044cbee7ccd4cfa5a006d3df8eaa635e1e70e798465a9
|
Provenance
The following attestation bundles were made for pybaseballstats-0.6.0.tar.gz:
Publisher:
release.yml on nico671/pybaseballstats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pybaseballstats-0.6.0.tar.gz -
Subject digest:
5458ac694f5035d17366fabc427670b5ca0c41ffaa05e667f32c3158422cbd19 - Sigstore transparency entry: 1317560069
- Sigstore integration time:
-
Permalink:
nico671/pybaseballstats@ab54d697717b23c8d788771360295883134f450a -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/nico671
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ab54d697717b23c8d788771360295883134f450a -
Trigger Event:
push
-
Statement type:
File details
Details for the file pybaseballstats-0.6.0-py3-none-any.whl.
File metadata
- Download URL: pybaseballstats-0.6.0-py3-none-any.whl
- Upload date:
- Size: 46.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a744136b6d7c0f3bff6bd41ee11cd05b0670403c00225ce5cabca24c925883ac
|
|
| MD5 |
ebcaca014e3d60c331bcc22f420ee53f
|
|
| BLAKE2b-256 |
0166cba151d0b486d54384f755bfbbe1ef1b51cb7108f138a55eb8a945d38fc6
|
Provenance
The following attestation bundles were made for pybaseballstats-0.6.0-py3-none-any.whl:
Publisher:
release.yml on nico671/pybaseballstats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pybaseballstats-0.6.0-py3-none-any.whl -
Subject digest:
a744136b6d7c0f3bff6bd41ee11cd05b0670403c00225ce5cabca24c925883ac - Sigstore transparency entry: 1317560126
- Sigstore integration time:
-
Permalink:
nico671/pybaseballstats@ab54d697717b23c8d788771360295883134f450a -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/nico671
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ab54d697717b23c8d788771360295883134f450a -
Trigger Event:
push
-
Statement type: