Predicting when AL managers will remove their starting pitchers.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

Pull the pitcher

Documentation

Predicting when MLB managers in the AL will pull their starting pitchers. This Deep, Recurrent Survival Analysis model is trained to predict the at-bat at which a pitcher is removed from the game, earning an F1-score of 0.97.

performance

Installation

$ pip install pull-the-pitcher

How to get the data

The ptp library comes with two main command-line utilites. After you install ptp, these should be directly available to you at the command-line, assuming you're in the environment that ptp was installed in.

Storing data in a `sqlite3` db

The first command-line utility is query_statcast, which invokes pybaseball's statcast() function to retrieve pitch-level data from statcast. This data will then be stored in a sqlite3 db file. Here's an example of how you could use it.

$ query_statcast --start_dt 2019-05-07 --end_dt 2019-06-09 --output_type db --output_path /tmp
This is a large query, it may take a moment to complete
Completed sub-query from 2019-05-07 to 2019-05-12
Completed sub-query from 2019-05-13 to 2019-05-18
Completed sub-query from 2019-05-19 to 2019-05-24
Completed sub-query from 2019-05-25 to 2019-05-30
Completed sub-query from 2019-05-31 to 2019-06-05
Completed sub-query from 2019-06-06 to 2019-06-09

Preparing data for modeling

The next command-line utility is prep_data_for_modeling, which pulls data from the database created in the previous command, then performs feature engineering and various aggregations to yield clean, at-bat level data amenable to a machine learning model. Here's an example of how you might use it.

$ prep_data_for_modeling --db_path /tmp/statcast_pitches.db --train_test_split_by start --output_path /tmp/
querying db at /tmp/statcast_pitches.db now.
In this dataset, there are 457 total games.
There are 63 'openers' in the dataset.
There are 851 total eligible game-pitcher combinations in this dataset.
Just processed 0th start.
Just processed 100th start.
Just processed 200th start.
Just processed 300th start.
Just processed 400th start.
Just processed 500th start.
Just processed 600th start.
Just processed 700th start.
Just processed 800th start.
There are 91 unique pitcher's in this dataset
['2019'] data ready for modeling and saved at /tmp/.

FAQ

Should any of the data be considered "uncensored"?
- The great thing about baseball data is that it is comprehensive, clean, and public! So, no, none of the data is "censored" in the survival analysis sense. We know the exact at bat at which every pitcher was removed from the game.
If none of the data is uncensored, why are you using survival analysis techniques?
- Well, the short answer is that they perform the best. Much of survival analysis is dedicated to modeling with both censored and uncensored data. Since none of our data is cenored, we have free reign to leverage any predictive modeling technique under the sun. Here, however, the process of predicting when a pitcher will be removed from a game fits very nicely in a time-to-event modeling framework, which survival analysis techniques are designed to handle.
How does this approach compare to traditional survival analysis?
- Traditional survival analysis is typically framed as a regression problem, which involves regressing the estimated units of time until the event of interest occurs. Alternatively, the approach we employ is framed as a classification problem, and involves predicting the probability that the event of interest (pitcher removed from the game) occurs at every unit of time (at bat).
  - While this neural network, classification-esque approach is non-traditional, it is not unheard of, as seen here and here.
How is this Deep, Recurrent Survival Analysis model different from a traditional LSTM?
1. Like other recurrent neural networks, our model predicts the conditional probability that a pitcher was pulled after each at-bat (conditioned on the game that has occurred up to that point). The novelty here occurs in the way that the model "estimates the survival rate through the probability chain rule, which captures the sequential dependency patterns between neighboring at-bats and back-propagates the gradient more efficiently." (quote from page 2 of the original paper).
2. This notion of estimation of the survival rate through the probability chain rule is further enforced by the use of the event time and the event rate loss functions. Notice that while our targets are binary, we are not using traditional log loss to train this model.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.1.14

Jul 23, 2020

0.1.13

Jul 23, 2020

0.1.12

Jul 23, 2020

0.1.11

Jul 23, 2020

0.1.10

Jul 23, 2020

0.1.9

Jul 22, 2020

0.1.8

Jul 22, 2020

0.1.7

Jul 22, 2020

0.1.6

Jul 21, 2020

0.1.5

Jul 9, 2020

0.1.4

Jul 9, 2020

0.1.3

Jun 17, 2020

0.1.2

Jun 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pull_the_pitcher-0.1.14.tar.gz (23.1 kB view hashes)

Uploaded Jul 23, 2020 Source

Built Distribution

pull_the_pitcher-0.1.14-py3-none-any.whl (23.4 kB view hashes)

Uploaded Jul 23, 2020 Python 3

Hashes for pull_the_pitcher-0.1.14.tar.gz

Hashes for pull_the_pitcher-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`a1bd4bfd9036ed66ccd820a66def0b2cbe67290e166e6bc2caf849fc72c0a0aa`
MD5	`dd6a59de6d0a18a7d786581ba3a4c944`
BLAKE2b-256	`01333b72748af39bca494c9e9139f052977a608d0940e6469bf1e706497f0ad4`

Hashes for pull_the_pitcher-0.1.14-py3-none-any.whl

Hashes for pull_the_pitcher-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df933adc0a60fceda1ddae2fd416e4f43431c34f0255cb623b76d622cc34aca3`
MD5	`d5d3ff63e3e1563cbd201e1c9ef8ad11`
BLAKE2b-256	`ac8aacdfa6a92e7b016f5f2f9888c656ab015049bf66202888bfad7f6c0f2500`

pull-the-pitcher 0.1.14

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Pull the pitcher

Documentation

Installation

How to get the data

Storing data in a `sqlite3` db

Preparing data for modeling

FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

pull-the-pitcher 0.1.14

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Pull the pitcher

Documentation

Installation

How to get the data

Storing data in a sqlite3 db

Preparing data for modeling

FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Storing data in a `sqlite3` db