pm-rank

A toolkit for scoring and ranking prediction market forecasters.

Project description

🏅 PM-RANK - An Analysis Toolkit for Prediction Markets

📝 1. Introduction

1.1: Installation

Install from PyPI (recommended):

pip install pm-rank

this will give you access to most basic scoring/ranking models except for the IRT model, which requires torch>=2.0.0. To install the full version, run:

pip install pm-rank[full]

If you want to work on the documentation, you can install the docs version:

pip install pm-rank[docs]

Install from source (local build):

git clone https://github.com/listar2000/pm_rank.git
cd pm_rank
pip install .

Or, for development (editable) mode:

pip install -e .

1.2: Unified Data Interface and Concepts

Please refer to data/base.py for the actual data model implementation. We give a high-level and non-comprehensive overview in a bottom-up manner.

ForecastEvent: this is the most atomic unit of prediction market data. It represents a single prediction made by a forecaster for a single forecast problem.
Key Fields in ForecastEvent
- problem_id: an unique identifier for the problem
- username: an unique identifier for the forecaster
- timestamp: the timestamp of the prediction. Note that this is not optional as we might want to stream the predictions in time. However, if the original data does not contain this information, we will use the current time as a placeholder.
- probs: the probability distribution over the options -- given by the forecaster.
  - correct_prob: the probability assigned to the correct answer.
ForecastProblem: this is a collection of ForecastEvents for a single forecast problem. It validates keeps track of metadata for the problem like the options and the correct option. It is also a handy way to organize the dataset as we treat ForecastProblem as the basic unit of streaming prediction market data.

In particular, if a ForecastProblem has the odds field, we would answer questions like "how much money can an individual forecaster make" and use these results to rank the forecasters. See model/average_return.py for more details.
Key Fields in ForecastProblem
- title: the title of the problem
- problem_id: the id of the problem
- options: the options for the problem
- correct_option: the correct option
- forecasts: the forecasts for the problem
- num_forecasters: the number of forecasters
- url: the URL of the problem
- odds (optional): the market odds for each option
ForecastChallenge: this is a collection of ForecastProblems . It implements two core functionalities for all scoring/ranking methods to use:
- get_problems -> List[ForecastProblem]: return all the problems in the challenge. Suitable for the full-analysis setting.
- stream_problems -> Iterator[List[ForecastProblem]]: return the problems in the challenge in a streaming setting. This setting simulates the real-world scenario where the predictions enter gradually. The scoring/ranking methods can also leverage this function to efficiently calculate the metrics at different time points (batches).

1.3: File Structure

crawler/: contains the code to scrape the prediction market data from GJO
data/: contains the datasets as well as the data structure codes
model/: contains the scoring/ranking methods
plotting/: contains the plotting code for the analysis results
test/: contains the testing code for our data and models

📊 2. Scoring & Ranking Models

2.1: Roadmap & Todos

Models:

Basic scoring rules: Brier score, Log score, etc.
Market earning model: directly evaluate how much money an individual forecaster can make (requires the odds field in the ForecastProblem)
Bradley-Terry (BT) type pairwise comparison models, including Elo rating
Item response theory (IRT) models
For all models, configuration files to specify hyperparameters

Diagnostics:

Calculate the correlation between different scoring/ranking methods
For BT-model, calculate a "graph-connectivity" metric to assess the suitability of the model

2.2: Implemented Models

Scoring Rules model/scoring_rule.py: utlize proper scoring rules to score and rank the forecasters. Some scoring rule (e.g. log) only requires the probability assigned to the correct option, while others (e.g. Brier) requires the full probability distribution.
Market Earning Model model/average_return.py: calculate the market earning for each forecaster. This model is only applicable when the odds field is present in the ForecastProblem. In particular, this is a class of models with a hyperparameter risk_aversion to uniformly control the risk-taking behavior of the forecasters. For instance, a risk_aversion=0 represents risk neutrality so we can translate the forecaster's probability distribution into their behavior -- all-in the most market-undervalued option. A risk_aversion=1 then corresponds to a log utility function.

An interesting future step, at least for LLM forecasters, is to ask it to verbalize its own risk-aversion and use it to calculate the market earning.
Generalized Bradley-Terry Model model/bradley_terry.py: Implements the Generalized Bradley-Terry (GBT) model for ranking forecasters based on their pairwise performance across prediction problems. The GBT model estimates a 'skill' parameter for each forecaster by comparing their probability assignments to the correct outcome, iteratively updating these skills to best explain the observed outcomes. This approach is particularly useful for settings where direct pairwise comparisons between forecasters are meaningful.
IRT (Item Response Theory) Models model/irt/: Provides IRT-based models for ranking forecasters by modeling both forecaster ability and problem difficulty/discrimination. The IRT model uses probabilistic inference (via SVI or MCMC) to estimate latent skill parameters for each forecaster and latent difficulty/discrimination parameters for each problem, allowing for a nuanced ranking that accounts for the varying challenge of different prediction problems.
Weighted Brier Scoring Rule model/scoring_rule.py: Once we have fit a IRT model, we can use the problem-level discrimination parameter to weight the Brier score. The simple way is through

# assume that `irt_model` is already fitted.
problem_discriminations, _ = irt_model.get_problem_level_parameters()
problem_discriminations = np.array([problem_discriminations[problem.problem_id] for problem in problems])
brier_scoring_rule = BrierScoringRule()
brier_ranking_result = brier_scoring_rule.fit(problems, problem_discriminations=problem_discriminations, include_scores=False)

Our experiment shows that this weighted metric has the highest rank correlation with the IRT model-based individual skill ranking.

2.3: Example: Fitting Streaming Prediction Market Data

In plotting/plot_crra_risks_curves.py, we demonstrate a use case of fitting the market earning model to the streaming prediction market data. The full dataset is streamed in batches of 100 problems. We then fit three market earning model at different risk-aversion levels (0, 0.5, 1). The results are shown in the following figure:

PM-RANK's modular design makesg it easy to conduct such analysis easily.

2.4: Example: Comparing Ranking Metrics and Plotting Correlations

To compare the different ranking metrics, see the code in plotting/plot_correlations_multiple_metrics.py. This script demonstrates how to compute all implemented ranking metrics (Brier, Market Earning, Generalized Bradley-Terry, IRT, and Weighted Brier) on a dataset and visualize the pairwise correlations between their resulting rankings. The resulting plot, which shows both Spearman and Kendall correlations between all pairs of ranking methods, is shown below:

Correlation Grid

🕷️ 3. Scraping Data

Step 1:

See crawler/scrape_gjo_problem_data.py for the details.

scrapes the problem data from GJO website using requests and BeautifulSoup. This step will gives us a metadata JSON file, e.g. data/xxx_challenge_metadata.json. Each entry in the metadata JSON file is a problem, with the following structure:

{
    "problem_id": "3940",
    "title": "Who will win the NFL Most Valuable Player Award for the 2024 season?",
    "url": "https://www.gjopen.com/questions/3940-who-will-win-the-nfl-most-valuable-player-award-for-the-2024-season",
    "metadata": {
        "status": "Closed",
        "end_date": "2025-02-07T03:40:25Z",
        "num_forecasters": 61,
        "num_forecasts": 147
    },
    "options": [
        "Josh Allen",
        "Saquon Barkley",
        "Sam Darnold",
        "Jared Goff",
        "Lamar Jackson",
        "Someone else"
    ],
    "correct_answer": "Josh Allen"
}

Step 2:

See crawler/scrape_gjo_predictions_data.py for the details.

scrapes the prediction data from GJO website using Playwright. This would require the additional installation of Playwright browser kernels via playwright install. This step will gives us a predictions JSON file, e.g. data/all_predictions.json. Each entry in the predictions JSON file is a prediction, with the following structure:

{
    "problem_id": "3940",
    "username": "Jonah-Neuwirth",
    "timestamp": "2025-02-06T05:44:32Z",
    "prediction": [0.2,0.0,0.0,0.0,0.8,0.0]
}

The username is an unique identifier for each forecaster. The prediction is a list of probabilities for the options specified in the problem metadata.

🔥 Warning: please respect the rate limit of GJO website when scraping any information. Our script does this by adding a random sleep time bewteen the requests.

Project details

Release history Release notifications | RSS feed

0.3.1

Feb 8, 2026

0.3.0

Feb 6, 2026

0.2.33

Jan 11, 2026

0.2.32

Dec 16, 2025

0.2.31

Nov 23, 2025

0.2.30

Nov 6, 2025

0.2.29

Oct 31, 2025

0.2.28

Oct 30, 2025

0.2.27

Oct 17, 2025

0.2.26

Oct 15, 2025

0.2.25

Sep 5, 2025

0.2.24

Aug 30, 2025

0.2.23

Aug 28, 2025

0.2.22

Aug 24, 2025

0.2.21

Aug 21, 2025

0.2.20

Aug 8, 2025

0.2.19

Aug 8, 2025

0.2.18

Aug 8, 2025

0.2.17

Aug 8, 2025

0.2.16

Aug 6, 2025

0.2.15

Aug 6, 2025

0.2.14

Aug 6, 2025

0.2.13

Aug 6, 2025

0.2.12

Aug 3, 2025

0.2.11

Jul 31, 2025

0.2.10

Jul 31, 2025

0.2.9

Jul 30, 2025

0.2.8

Jul 29, 2025

0.2.7

Jul 28, 2025

0.2.6

Jul 28, 2025

0.2.5

Jul 28, 2025

0.2.4

Jul 27, 2025

0.2.3

Jul 27, 2025

This version

0.2.2

Jul 27, 2025

0.2.1

Jul 27, 2025

0.2.0

Jul 25, 2025

0.1.3

Jul 24, 2025

0.1.2

Jul 23, 2025

0.1.1

Jul 23, 2025

0.1.0

Jul 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pm_rank-0.2.2.tar.gz (50.1 kB view details)

Uploaded Jul 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pm_rank-0.2.2-py3-none-any.whl (50.4 kB view details)

Uploaded Jul 27, 2025 Python 3

File details

Details for the file pm_rank-0.2.2.tar.gz.

File metadata

Download URL: pm_rank-0.2.2.tar.gz
Upload date: Jul 27, 2025
Size: 50.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for pm_rank-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`d5a77677d8a2ac9a57167a3320ca16b7ed1de292ec18eb7d8c710a66eeae2a6f`
MD5	`a59509e971a6be905fd0d0f048e46781`
BLAKE2b-256	`91e13f44e52a492a3ce61fd61c3eb9d311d687e63cef69ecf51b66f00df1702d`

See more details on using hashes here.

File details

Details for the file pm_rank-0.2.2-py3-none-any.whl.

File metadata

Download URL: pm_rank-0.2.2-py3-none-any.whl
Upload date: Jul 27, 2025
Size: 50.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for pm_rank-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4784dca4c4ff9a2282e476db4503c067c7116da868b7bea2069b67b3c3da6aae`
MD5	`9a786b5d30491813e67940fa97d24cd8`
BLAKE2b-256	`c907bde953e1bc1adc88d1bf05d119e43e7d2279aab54d8bcc062709bb86871f`

See more details on using hashes here.

pm-rank 0.2.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

🏅 PM-RANK - An Analysis Toolkit for Prediction Markets

📝 1. Introduction

1.1: Installation

1.2: Unified Data Interface and Concepts

1.3: File Structure

📊 2. Scoring & Ranking Models

2.1: Roadmap & Todos

Models:

Diagnostics:

2.2: Implemented Models

2.3: Example: Fitting Streaming Prediction Market Data

2.4: Example: Comparing Ranking Metrics and Plotting Correlations

🕷️ 3. Scraping Data

Step 1:

Step 2:

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes