Skip to main content

Load, analyze, and enrich retrosheet.org MLB data.

Project description

pyretrosheet

PyPI version Coverage

pyretrosheet is under active development and is not feature complete.

Load, analyze, and enrich retrosheet.org MLB data using Python representations.

Retrosheet provides play-by-play and other miscellaneous MLB data (at the time of writing, includes all play-by-play data for all AL and NL seasons from 1919 to 2022).

pyretrosheet provides functionality for:

  • downloading Retrosheet play-by-play data
  • parsing and loading play-by-play data into Python objects to make the data easier to understand and analyze
  • enriching data to include player and summary statistics not encoded directly by Retrosheet

pyretrosheet does not provide functionality for:

  • downloading/using MLB data from sources other than Retrosheet

See Retrosheet Data Resources for other tools that parse Retrosheet event files. At the time of writing, these resources focus on loading/dumping to other data formats like CSV and SQL databases.

Usage

pip install pyretrosheet

Load Games

By default, data downloaded from retrosheet.org is stored at ~/.pyretrosheet/data/, but can be overriden via the data_dir argument.

import pyretrosheet

games = pyretrosheet.load_games(year=2022)

print(games[0])
"""
Game(
  id=GameID(home_team_id='SFN', date=datetime.date(2022, 4, 8), game_number=0, raw='id,SFN202204080'),
  home_team_id=SFN,
  visiting_team_id=MIA,
  num_chronological_events=150,
  earned_runs={'bleir001': 1, 'alcas001': 2, 'bassa001': 1, 'benda001': 1, 'webbl001': 1, 'dovac001': 3, 'leond003': 1},
)
"""

TODO: Add more examples

Data Availability

Retrosheet Event File Coverage

Retrosheet's Event File Spec defines the encoding for event files (play-by-play game data). The spec (as of 11/30/2023) can also be found at docs/event_file_spec.txt.

There is a wide amount of data encoded into these files and this package does not cover all encodings.

Contributions are welcome for any encodings not covered!

Covered

  • Loading all games from a given event file
  • Record types
    • id
    • info
    • start
    • sub
    • play
    • data
    • com

Not Covered

  • Record types

    • play's pitching encoding
    • radj
    • badj
    • padj
    • ladj
    • presadj
  • Miscellaneous data

    • replays
    • ejections
    • umpire changes
    • protests
    • suspensions

Enriched

pyretrosheet provides enriched Retrosheet data to provide:

  • TODO

Contributing

Makefile targets

help: Show this help.
setup: Install the package and dev dependencies into a virtualenv.
test:  Run pytest on the tests dir.
test_all_data:  Run pytest on all Retrosheet data.
format: Run black and isort on package and tests dirs.
lint:  Run ruff and mypy on package files.
coverage:  Run test coverage and update coverage badge
bump_version:  Increment patch version references in the project
publish_to_testpypi:  Publish the package to test.pypi.org.
publish_to_pypi:  Publish the package to pypi.org.

Todo

Non-Trivial

  • ReadTheDocs
  • Verify enriched data with alternative sources like Baseball Reference
  • Determine top-level interface for querying data
  • Implement index of game files to easily lookup games for:
    • a specific team within a year
    • a specific game
  • Stats
    • Hits (H)
    • Walks (W)
    • Hit By Pitches (HBP)
    • Sacrifice Flys (SF)
    • At Bats (AB)
    • Singles (S)
    • Doubles (D)
    • Triples (T)
    • Home Runs (HR)
    • Composite
      • Batting Average (BA)
      • Slugging Percentage (SP)
      • On Base Percentage (OBP)
    • Difficult and Needs Lots of Validation
      • Runs (R)
      • Runs Batted In (RBI)
  • Aggregate stats
    • Mean, Median, Std. Dev, Min, Max

Trivial

  • Parse out 'info' fields into pyretrosheet.models.Game properties
  • Encoding pitches from play data
  • Improve error handling for inability to retrieve Retrosheet data
  • Improve README Usage examples
  • Add CONTRIBUTING.md
  • Add interface to load stats

Retrosheet Notice

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Credits

  • Project skeleton generated via cookiecutter https://github.com/rozelie/Python-Project-Cookiecutter
  • Thank you Retrosheet team for making your data free and publicly available!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyretrosheet-0.0.10.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

pyretrosheet-0.0.10-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file pyretrosheet-0.0.10.tar.gz.

File metadata

  • Download URL: pyretrosheet-0.0.10.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.3

File hashes

Hashes for pyretrosheet-0.0.10.tar.gz
Algorithm Hash digest
SHA256 b233152c72adc7046e18755503fe9bbcf5c7b3113320013353088739b0e32110
MD5 4f6bf6d0b134aedee4be198bd1b6b5fe
BLAKE2b-256 f05e2f3d90dbf77ac5f8c564ba019110073badc09bf7fbb5aaf91562cc12b94d

See more details on using hashes here.

File details

Details for the file pyretrosheet-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for pyretrosheet-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 cd7797a217efaaa0f5cec434f9c4e184817687d7cf3f3b560cb28b7d0dcfb4bd
MD5 3d34d946f9481a75992b8f81cb1c8241
BLAKE2b-256 105381a95e8cda7ae5661875a8d0c18ebeed141d58f17a4da1ee1e09623c9d03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page