Skip to main content

Comparing Sequential Forecasters

Project description

comparecast

image image

Code and Python package accompanying our paper, Comparing Sequential Forecasters, published to Operations Research (2023). A preprint version is also available on arXiv.

In the paper, we derive anytime-valid and distribution-free inference procedures for comparing sequential forecasters that each make a prediction over a sequence of events. This accompanying package, named comparecast, includes implementations of:

  1. confidence sequence (CS): a sequence of confidence intervals that estimate the average score difference between forecasters A and B (which may vary over time);
  2. e-process: a sequence of evidences against the null hypothesis that forecaster A scores no better than forecaster B on average.

Confidence sequences can be continuously monitored without having to worry about "sampling to a foregone conclusion," which happens when using a standard, fixed-time confidence interval and repeatedly peeking at the data. E-processes also allow peeking at the data before deciding when to stop the experiment ("anytime-valid"). See here for an overview of these methods, or here for a standalone implementation of confidence sequences.

In the plot below, we plot the 95% (empirical-Bernstein) CS and its corresponding e-processes (one for each forecaster) for comparing the probability forecasts of FiveThirtyEight and Vegas odds on all Major League Baseball games from 2010 to 2019. The average Brier score slightly favors Vegas odds over time, and the CS closely tracks this average score and represents the uncertainty in estimating the score while accounting for the randomness of data.

See nb_comparecast_baseball.ipynb for the code that generated this plot.

Installation

Requires Python 3.7+.

From pip:

pip install --upgrade pip
pip install --upgrade pandas seaborn tqdm confseq
pip install --upgrade comparecast

From source:

git clone https://github.com/yjchoe/ComparingForecasters
cd ComparingForecasters

pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Data Sources

See data/README.md.

Sample Usage

Also see experiment notebooks below.

Python

The following sample code generates some simulated data and forecasters. Then, it computes the CS and the e-processes for comparing two of these forecasters, and saves the relevant plots inside the plots/test directory.

import comparecast as cc

# Generate/retrieve synthetic data
data = cc.data_utils.synthetic.get_data("default", size=1000)

# Calculate, save, and plot the forecasts
forecasters = ["k29_poly3", "laplace", "constant_0.5"]
data = cc.forecast(data, forecasters, out_file="data/test.csv") 
cc.plot_forecasts(data, forecasters, plots_dir="plots/test")

# Compare forecasts using confidence sequences & e-values
results = cc.compare_forecasts(
    data, 
    "k29_poly3", 
    "laplace", 
    scoring_rule="brier", 
    alpha=0.05,
    compute_cs=True,
    compute_e=True,
)
# returns a pandas DataFrame
results.tail(5)
#      time       lcb       ucb         e_pq      e_qp
# 995   996  0.012868  0.072742  2025.725774  0.021684
# 996   997  0.013050  0.072879  2157.262456  0.021672
# 997   998  0.012635  0.072492  1886.687861  0.021596
# 998   999  0.012824  0.072637  2013.209084  0.021583
# 999  1000  0.012447  0.072275  1783.204679  0.021519


# Draw a comparison plot and save in plots/test/*.pdf
results, axes = cc.plot_comparison(
    data, 
    "k29_poly3", 
    "laplace", 
    scoring_rule="brier", 
    alpha=0.05,
    baselines=("h", "acs"),
    plot_e=True,
    plot_width=True,
    plots_dir="plots/test",
)

Command Line Interface

(Clone this repo first.)

# Generate synthetic data and forecasts
python3 forecast.py -d default -n 1000 -f all \
    -o forecasts/test.csv -p plots/test

# Compare forecasts and plot results
python3 plot_comparisons.py -d forecasts/test.csv \
    -p k29_poly3 -q laplace --baselines h acs -o plots/test
    
# Compare 538 and vegas forecasters
python3 plot_comparisons.py -d forecasts/mlb_2010_2019.csv \
    -p fivethirtyeight -q vegas --baselines acs -o plots/test/mlb_2010_2019 \
    --ylim-scale 0.01

Experiments

Main experiments (appearing in the main paper):

Additional experiments (some appearing in the E-Companion/Appendix of our paper):

Code License

MIT

Authors

YJ Choe and Aaditya Ramdas

References

If you use parts of our work, please cite our paper as follows:

Text:

Choe, Y. J., & Ramdas, A. (2023). Comparing sequential forecasters. Operations Research. https://doi.org/10.1287/opre.2021.0792

BibTeX:

@article{choe2023comparing,
  title={Comparing sequential forecasters},
  author={Choe, Yo Joong and Ramdas, Aaditya},
  journal={Operations Research},
  year={2023},
  doi={https://doi.org/10.1287/opre.2021.0792},
  publisher={INFORMS}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comparecast-1.0.0.tar.gz (42.9 kB view hashes)

Uploaded Source

Built Distribution

comparecast-1.0.0-py3-none-any.whl (45.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page