Predict MLB pitcher/batter behavior and outcomes using a given context
Project description
PitchPredict
Cutting-edge MLB pitch-predicting software utilizing the latest Statcast data. Open-source and free to use. Brought to you by baseball-analytica.com.
Read our technical writeup: Predicting MLB Pitch Sequences with xLSTM
Features
- Two prediction algorithms: Similarity-based (nearest neighbor) and xLSTM sequence model
- Multiple interfaces: Python API, REST API server, and CLI
- Rich predictions: Pitch type probabilities, speed/location distributions, outcome analysis
- Batted ball predictions: Outcome probabilities from exit velocity and launch angle with context-aware filtering
- Disk-backed caching: Parquet cache with incremental Statcast updates
- Statcast powered: Uses MLB's comprehensive pitch tracking data via pybaseball
Installation
Package Installation
uv pip install pitchpredict
Or with pip:
pip install pitchpredict
Requires Python 3.12 or higher. We recommend using uv for faster, more reliable package management.
Development Installation
git clone https://github.com/baseball-analytica/pitchpredict.git
cd pitchpredict
uv sync
Quick Start
Python API
import asyncio
from pitchpredict import PitchPredict
async def main():
client = PitchPredict()
# Resolve MLBAM IDs (cached) for pitcher/batter
pitcher_id = await client.get_player_id_from_name("Clayton Kershaw")
batter_id = await client.get_player_id_from_name("Aaron Judge")
# Predict pitcher's next pitch
result = await client.predict_pitcher(
pitcher_id=pitcher_id,
batter_id=batter_id,
count_balls=0,
count_strikes=0,
score_bat=0,
score_fld=0,
game_date="2024-06-15",
algorithm="similarity"
)
print(result.basic_pitch_data["pitch_type_probs"])
# {'FF': 0.45, 'SL': 0.30, 'CU': 0.15, 'CH': 0.10}
asyncio.run(main())
Pitcher and batter IDs are MLBAM IDs; use PitchPredict.get_player_id_from_name (or the REST /players/lookup endpoint) to resolve names.
Pitcher predictions return a PredictPitcherResponse model; use attribute access or model_dump() for a dict.
Caching is enabled by default and stores data in .pitchpredict_cache. Delete the folder to refresh cached data.
xLSTM Quick Start
For xLSTM predictions, you must pass prev_pitches (empty list allowed for cold-start):
result = await client.predict_pitcher(
pitcher_id=pitcher_id,
batter_id=batter_id,
prev_pitches=[], # required for xLSTM, empty list is cold-start
game_date="2024-06-15",
algorithm="xlstm",
)
xLSTM loads weights lazily. Weights will download automatically on first use. Alternatively, set PITCHPREDICT_XLSTM_PATH to a local checkpoint directory containing model.safetensors and config.json.
When providing history, each pitch in prev_pitches must include a pa_id (plate-appearance id).
CLI
Run predictions and look up players directly from the command line (no server required):
# Lookup player IDs
pitchpredict player lookup "Aaron Judge"
# Predict next pitch (names or MLBAM IDs)
pitchpredict predict pitcher "Zack Wheeler" "Juan Soto" --balls 1 --strikes 2
# Predict batter outcome given a pitch
pitchpredict predict batter "Aaron Judge" "Gerrit Cole" FF 96.5 0.15 2.85
# Predict batted-ball outcome (use --format json for machine-readable output)
pitchpredict predict batted-ball 102.3 24 --format json
Use --verbose for detailed tables, and pitchpredict cache status to inspect the local cache.
REST API Server
Start the server:
pitchpredict serve
Make a prediction:
curl "http://localhost:8056/players/lookup?name=Clayton%20Kershaw&fuzzy=true"
curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"
Use the returned key_mlbam values in the prediction request:
curl -X POST http://localhost:8056/predict/pitcher \
-H "Content-Type: application/json" \
-d '{
"pitcher_id": 477132,
"batter_id": 592450,
"count_balls": 0,
"count_strikes": 0,
"score_bat": 0,
"score_fld": 0,
"game_date": "2024-06-15",
"algorithm": "similarity"
}'
pitcher_id and batter_id are MLBAM IDs; use /players/lookup to resolve names.
Predict batted ball outcomes:
curl -X POST http://localhost:8056/predict/batted-ball \
-H "Content-Type: application/json" \
-d '{
"launch_speed": 95.0,
"launch_angle": 18.0,
"algorithm": "similarity"
}'
Lookup player IDs:
curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"
Lookup player metadata by MLBAM ID:
curl http://localhost:8056/players/592450
Documentation
Full documentation is available in the docs/ folder:
- Getting Started - Quick start guide
- Installation - Detailed installation instructions
- Python API Reference -
PitchPredictclass documentation - REST API Reference - Server endpoints
- CLI Reference - Command-line interface
- Algorithms - Similarity and xLSTM algorithms
- Caching - Cache behavior and storage layout
Methodology
PitchPredict offers two algorithms (details in Algorithms):
Similarity Algorithm
Finds historical pitches most similar to the current game context using weighted nearest-neighbor analysis:
- Fetch all pitches thrown by the pitcher from Statcast (2015-01-01 through the requested
game_date). - Compute similarity scores across contextual features (batter ID, counts, bases, score, inning, date, fielders, rest days, strike zone) using softmaxed weights from
SimilarityWeights. - Sample the top
sample_pctg(default 0.05) most similar pitches. - Aggregate statistics and sample concrete pitches to produce predictions.
Batted ball predictions use continuous similarity scoring on exit velocity and launch angle, plus optional spray angle, bases state, outs, and date recency, then sample the top similar events for outcome probabilities and expected stats.
xLSTM Algorithm
Uses an xLSTM sequence model trained on pitch sequences with a ~260-token vocabulary encoding pitch type, speed, spin, location, and result. The model consumes contextual features (player IDs, count, bases, score, inning, and more) to predict the next pitch token sequence, which is decoded back into pitch attributes and outcomes.
Acknowledgements
PitchPredict would not be possible without pybaseball, the open-source and MIT-licensed baseball data scraping library. The baseball data itself largely comes from Statcast, but Baseball-Reference and FanGraphs are sources as well.
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pitchpredict-0.5.0.tar.gz.
File metadata
- Download URL: pitchpredict-0.5.0.tar.gz
- Upload date:
- Size: 280.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19bb49393a3b4bffd03e2ae742eb9637f3a191d4e2ba743fb8ce888836727aac
|
|
| MD5 |
b1bab428ca5e50f23c7f63d20f07998e
|
|
| BLAKE2b-256 |
f5328e6f92079e9615b7d998b946c6f32301537797ffe1e13e2c287366f353a6
|
Provenance
The following attestation bundles were made for pitchpredict-0.5.0.tar.gz:
Publisher:
python-publish.yml on baseball-analytica/pitchpredict
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pitchpredict-0.5.0.tar.gz -
Subject digest:
19bb49393a3b4bffd03e2ae742eb9637f3a191d4e2ba743fb8ce888836727aac - Sigstore transparency entry: 868653632
- Sigstore integration time:
-
Permalink:
baseball-analytica/pitchpredict@bd8589966b31d210aa1727b0616727ac8d5de711 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/baseball-analytica
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@bd8589966b31d210aa1727b0616727ac8d5de711 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pitchpredict-0.5.0-py3-none-any.whl.
File metadata
- Download URL: pitchpredict-0.5.0-py3-none-any.whl
- Upload date:
- Size: 80.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a28ee2f385790c08581de68f4e10c8dd2e2b1ccb1c24cfeb1729146a107f9ecd
|
|
| MD5 |
433fdd46c955d4b65a503784036630cb
|
|
| BLAKE2b-256 |
6028ccf21744a3a2f1cf7d86333f15d84ae18172d80a31f15df5a4a95de10b56
|
Provenance
The following attestation bundles were made for pitchpredict-0.5.0-py3-none-any.whl:
Publisher:
python-publish.yml on baseball-analytica/pitchpredict
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pitchpredict-0.5.0-py3-none-any.whl -
Subject digest:
a28ee2f385790c08581de68f4e10c8dd2e2b1ccb1c24cfeb1729146a107f9ecd - Sigstore transparency entry: 868653638
- Sigstore integration time:
-
Permalink:
baseball-analytica/pitchpredict@bd8589966b31d210aa1727b0616727ac8d5de711 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/baseball-analytica
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@bd8589966b31d210aa1727b0616727ac8d5de711 -
Trigger Event:
release
-
Statement type: