Skip to main content

Predict MLB pitcher/batter behavior and outcomes using a given context

Project description

PitchPredict

Cutting-edge MLB pitch-predicting software utilizing the latest Statcast data. Open-source and free to use. Brought to you by baseball-analytica.com.

Read our technical writeup: Predicting MLB Pitch Sequences with xLSTM

Features

  • Two prediction algorithms: Similarity-based (nearest neighbor) and xLSTM sequence model
  • Multiple interfaces: Python API, REST API server, and CLI
  • Rich predictions: Pitch type probabilities, speed/location distributions, outcome analysis
  • Batted ball predictions: Outcome probabilities from exit velocity and launch angle with context-aware filtering
  • Disk-backed caching: Parquet cache with incremental Statcast updates
  • Statcast powered: Uses MLB's comprehensive pitch tracking data via pybaseball

Installation

Package Installation

uv pip install pitchpredict

Or with pip:

pip install pitchpredict

Requires Python 3.12 or higher. We recommend using uv for faster, more reliable package management.

Development Installation

git clone https://github.com/baseball-analytica/pitchpredict.git
cd pitchpredict
uv sync

Quick Start

Python API

import asyncio
from pitchpredict import PitchPredict

async def main():
    client = PitchPredict()

    # Resolve MLBAM IDs (cached) for pitcher/batter
    pitcher_id = await client.get_player_id_from_name("Clayton Kershaw")
    batter_id = await client.get_player_id_from_name("Aaron Judge")

    # Predict pitcher's next pitch
    result = await client.predict_pitcher(
        pitcher_id=pitcher_id,
        batter_id=batter_id,
        count_balls=0,
        count_strikes=0,
        score_bat=0,
        score_fld=0,
        game_date="2024-06-15",
        algorithm="similarity"
    )

    print(result.basic_pitch_data["pitch_type_probs"])
    # {'FF': 0.45, 'SL': 0.30, 'CU': 0.15, 'CH': 0.10}

asyncio.run(main())

Pitcher and batter IDs are MLBAM IDs; use PitchPredict.get_player_id_from_name (or the REST /players/lookup endpoint) to resolve names. Pitcher predictions return a PredictPitcherResponse model; use attribute access or model_dump() for a dict.

Caching is enabled by default and stores data in .pitchpredict_cache. Delete the folder to refresh cached data.

xLSTM Quick Start

For xLSTM predictions, you must pass prev_pitches (empty list allowed for cold-start):

result = await client.predict_pitcher(
    pitcher_id=pitcher_id,
    batter_id=batter_id,
    prev_pitches=[],  # required for xLSTM, empty list is cold-start
    game_date="2024-06-15",
    algorithm="xlstm",
)

xLSTM loads weights lazily. Weights will download automatically on first use. Alternatively, set PITCHPREDICT_XLSTM_PATH to a local checkpoint directory containing model.safetensors and config.json.

When providing history, each pitch in prev_pitches must include a pa_id (plate-appearance id).

CLI

Run predictions and look up players directly from the command line (no server required):

# Lookup player IDs
pitchpredict player lookup "Aaron Judge"

# Predict next pitch (names or MLBAM IDs)
pitchpredict predict pitcher "Zack Wheeler" "Juan Soto" --balls 1 --strikes 2

# Predict batter outcome given a pitch
pitchpredict predict batter "Aaron Judge" "Gerrit Cole" FF 96.5 0.15 2.85

# Predict batted-ball outcome (use --format json for machine-readable output)
pitchpredict predict batted-ball 102.3 24 --format json

Use --verbose for detailed tables, and pitchpredict cache status to inspect the local cache.

REST API Server

Start the server:

pitchpredict serve

Make a prediction:

curl "http://localhost:8056/players/lookup?name=Clayton%20Kershaw&fuzzy=true"
curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"

Use the returned key_mlbam values in the prediction request:

curl -X POST http://localhost:8056/predict/pitcher \
  -H "Content-Type: application/json" \
  -d '{
    "pitcher_id": 477132,
    "batter_id": 592450,
    "count_balls": 0,
    "count_strikes": 0,
    "score_bat": 0,
    "score_fld": 0,
    "game_date": "2024-06-15",
    "algorithm": "similarity"
  }'

pitcher_id and batter_id are MLBAM IDs; use /players/lookup to resolve names.

Predict batted ball outcomes:

curl -X POST http://localhost:8056/predict/batted-ball \
  -H "Content-Type: application/json" \
  -d '{
    "launch_speed": 95.0,
    "launch_angle": 18.0,
    "algorithm": "similarity"
  }'

Lookup player IDs:

curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"

Lookup player metadata by MLBAM ID:

curl http://localhost:8056/players/592450

Documentation

Full documentation is available in the docs/ folder:

Methodology

PitchPredict offers two algorithms (details in Algorithms):

Similarity Algorithm

Finds historical pitches most similar to the current game context using weighted nearest-neighbor analysis:

  1. Fetch all pitches thrown by the pitcher from Statcast (2015-01-01 through the requested game_date).
  2. Compute similarity scores across contextual features (batter ID, counts, bases, score, inning, date, fielders, rest days, strike zone) using softmaxed weights from SimilarityWeights.
  3. Sample the top sample_pctg (default 0.05) most similar pitches.
  4. Aggregate statistics and sample concrete pitches to produce predictions.

Batted ball predictions use continuous similarity scoring on exit velocity and launch angle, plus optional spray angle, bases state, outs, and date recency, then sample the top similar events for outcome probabilities and expected stats.

xLSTM Algorithm

Uses an xLSTM sequence model trained on pitch sequences with a ~260-token vocabulary encoding pitch type, speed, spin, location, and result. The model consumes contextual features (player IDs, count, bases, score, inning, and more) to predict the next pitch token sequence, which is decoded back into pitch attributes and outcomes.

Acknowledgements

PitchPredict would not be possible without pybaseball, the open-source and MIT-licensed baseball data scraping library. The baseball data itself largely comes from Statcast, but Baseball-Reference and FanGraphs are sources as well.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pitchpredict-0.5.0.tar.gz (280.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pitchpredict-0.5.0-py3-none-any.whl (80.3 kB view details)

Uploaded Python 3

File details

Details for the file pitchpredict-0.5.0.tar.gz.

File metadata

  • Download URL: pitchpredict-0.5.0.tar.gz
  • Upload date:
  • Size: 280.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pitchpredict-0.5.0.tar.gz
Algorithm Hash digest
SHA256 19bb49393a3b4bffd03e2ae742eb9637f3a191d4e2ba743fb8ce888836727aac
MD5 b1bab428ca5e50f23c7f63d20f07998e
BLAKE2b-256 f5328e6f92079e9615b7d998b946c6f32301537797ffe1e13e2c287366f353a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for pitchpredict-0.5.0.tar.gz:

Publisher: python-publish.yml on baseball-analytica/pitchpredict

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pitchpredict-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pitchpredict-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 80.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pitchpredict-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a28ee2f385790c08581de68f4e10c8dd2e2b1ccb1c24cfeb1729146a107f9ecd
MD5 433fdd46c955d4b65a503784036630cb
BLAKE2b-256 6028ccf21744a3a2f1cf7d86333f15d84ae18172d80a31f15df5a4a95de10b56

See more details on using hashes here.

Provenance

The following attestation bundles were made for pitchpredict-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on baseball-analytica/pitchpredict

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page