Skip to main content

A Pytorch Backend Library for Choice Modelling

Project description

Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python

Downloads Downloads Downloads

Authors: Tianyu Du, Ayush Kanodia and Susan Athey; Contact: tianyudu@stanford.edu

Acknowledgements: We would like to thank Erik Sverdrup, Charles Pebereau and Keshav Agrawal for their feedback.

torch-choice is a library for flexible, fast choice modeling with PyTorch: it has logit and nested logit models, designed for both estimation and prediction. Please check our paper and online documentation for more details. Unique features of torch-choice include:

  1. GPU support via torch for speed
  2. No GPU? No Problem! You can still leverage the multi-core processing of PyTorch and gain speed-up
  3. Specify customized models
  4. Specify availability sets
  5. Maximum Likelihood Estimation (MLE) (optionally, reporting standard errors or MAP inference with Bayesian Priors on coefficients)
  6. Estimation via minimization of Cross Entropy Loss (optionally with L1/L2 regularization)
  7. Integration with PyTorch-Lightning for easy training diagnosis.

What's in the package?

Overall, the torch-choice package offers the following features:

  1. The package includes a data management module called ChoiceDataset, which is built upon PyTorch's dataset module. Our dataset implementation allows users to easily move data between CPU and GPU. Unlike traditional long or wide formats, the ChoiceDataset offers a memory-efficient way to manage observables.

  2. The package provides a (1) conditional logit model and (2) a nested logit model for consumer choice modeling.

  3. The package leverage GPU acceleration using PyTorch and easily scale to large dataset of millions of choice records. All models are trained using state-of-the-art optimizers by in PyTorch. These optimization algorithms are tested to be scalable by modern machine learning practitioners. However, you can rest assure that the package runs flawlessly when no GPU is used as well.

  4. Setting up the PyTorch training pipelines can be frustrating. We provide easy-to-use PyTorch lightning wrapper of models to free researchers from the hassle from setting up PyTorch optimizers and training loops.

Installation

We offer two ways to install the package. This is a work in progress. We recommend installing the package from source to get the latest version and bug-fix patches. We are actively working on this package and will be adding more features and examples. Please feel free to reach out to us with any questions or suggestions.

Installation from Pip

Simply run pip install torch-choice to install the package. This will install the latest stable version of the package.

Installation from Source

For those wish to leverage the latest feature, you can install torch-choice from Github source.

  1. Clone the repository to your local machine or server.
  2. Install required dependencies in requirements.txt. Please make sure to install the correct version of PyTorch compatible with your CUDA driver. PyTorch needs to be installed before installing PyTorch Lightning.
  3. Run python3 setup.py install.
  4. Check installation by running python3 -c 'import torch_choice; print(torch_choice.__version__)'.

The installation page provides more details on installation.

Quick Example: Transportation Choice Dataset

In this demonstration, we setup a minimal example of fitting a conditional logit model using our package. We provide equivalent R code as well for reference, to aid replicating from R to this package. We are modelling people's choices on transportation modes using the publicly available ModeCanada dataset. More information about the ModeCanada: Mode Choice for the Montreal-Toronto Corridor.

In this example, we are estimating the utility for user $u$ to choose transport method $i$ in session $s$ as

$$ U_{uis} = \alpha_i + \beta_i \text{income}_s + \gamma \text{cost} + \delta \text{freq} + \eta \text{ovt} + \iota_i \text{ivt} + \varepsilon $$

# load package.
import torch_choice
device = "cpu"  # choose "cuda" if using GPU, choose "mps" if using Apple Silicon.
# load data.
dataset = torch_choice.data.load_mode_canada_dataset().to(device)
# define the conditional logit model.
model = torch_choice.model.ConditionalLogitModel(
    formula='(itemsession_cost_freq_ovt|constant) + (session_income|item) + (itemsession_ivt|item-full) + (intercept|item)',
    dataset=dataset,
    num_items=4).to(device)
# fit the conditional logit model.
torch_choice.run(model, dataset, num_epochs=500, learning_rate=0.003, batch_size=-1, model_optimizer="LBFGS", device=device)
  | Name  | Type                  | Params
------------------------------------------------
0 | model | ConditionalLogitModel | 13
------------------------------------------------
13        Trainable params
0         Non-trainable params
13        Total params
0.000     Total estimated model params size (MB)
Epoch 499: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 50.89it/s, loss=1.87e+03, v_num=3]
`Trainer.fit` stopped: `max_epochs=500` reached.
Epoch 499: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 50.06it/s, loss=1.87e+03, v_num=3]
Time taken for training: 17.461677074432373
Skip testing, no test dataset is provided.
==================== model results ====================
Log-likelihood: [Training] -1874.3455810546875, [Validation] N/A, [Test] N/A

| Coefficient                           |   Estimation |   Std. Err. |    z-value |    Pr(>|z|) | Significance   |
|:--------------------------------------|-------------:|------------:|-----------:|------------:|:---------------|
| itemsession_cost_freq_ovt[constant]_0 |  -0.0336983  |  0.00709653 |  -4.74856  | 2.0487e-06  | ***            |
| itemsession_cost_freq_ovt[constant]_1 |   0.0926308  |  0.00509798 |  18.1701   | 0           | ***            |
| itemsession_cost_freq_ovt[constant]_2 |  -0.0430381  |  0.00322568 | -13.3423   | 0           | ***            |
| session_income[item]_0                |  -0.0890566  |  0.0183376  |  -4.8565   | 1.19479e-06 | ***            |
| session_income[item]_1                |  -0.0278864  |  0.00387063 |  -7.20461  | 5.82201e-13 | ***            |
| session_income[item]_2                |  -0.0380771  |  0.00408164 |  -9.32887  | 0           | ***            |
| itemsession_ivt[item-full]_0          |   0.0594989  |  0.0100751  |   5.90553  | 3.51515e-09 | ***            |
| itemsession_ivt[item-full]_1          |  -0.00684573 |  0.00444405 |  -1.54043  | 0.123457    |                |
| itemsession_ivt[item-full]_2          |  -0.006402   |  0.00189828 |  -3.37252  | 0.000744844 | ***            |
| itemsession_ivt[item-full]_3          |  -0.00144797 |  0.00118764 |  -1.2192   | 0.22277     |                |
| intercept[item]_0                     |   0.664312   |  1.28022    |   0.518904 | 0.603828    |                |
| intercept[item]_1                     |   1.79165    |  0.708119   |   2.53015  | 0.0114015   | *              |
| intercept[item]_2                     |   3.23494    |  0.623899   |   5.18504  | 2.15971e-07 | ***            |
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ConditionalLogitModel(
  (coef_dict): ModuleDict(
    (itemsession_cost_freq_ovt[constant]): Coefficient(variation=constant, num_items=4, num_users=None, num_params=3, 3 trainable parameters in total, device=cpu).
    (session_income[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
    (itemsession_ivt[item-full]): Coefficient(variation=item-full, num_items=4, num_users=None, num_params=1, 4 trainable parameters in total, device=cpu).
    (intercept[item]): Coefficient(variation=item, num_items=4, num_users=None, num_params=1, 3 trainable parameters in total, device=cpu).
  )
)
Conditional logistic discrete choice model, expects input features:

X[itemsession_cost_freq_ovt[constant]] with 3 parameters, with constant level variation.
X[session_income[item]] with 1 parameters, with item level variation.
X[itemsession_ivt[item-full]] with 1 parameters, with item-full level variation.
X[intercept[item]] with 1 parameters, with item level variation.
device=cpu

Mode Canada with R

We include the R code for the ModeCanada example as well.

R code
```{r}
# load packages.
library("mlogit")

# load data.
ModeCanada <- read.csv('https://raw.githubusercontent.com/gsbDBI/torch-choice/main/tutorials/public_datasets/ModeCanada.csv?token=GHSAT0AAAAAABRGHCCSNNQARRMU63W7P7F4YWYP5HA')
ModeCanada <- select(ModeCanada, -X)
ModeCanada$alt <- as.factor(ModeCanada$alt)

# format data.
MC <- dfidx(ModeCanada, subset = noalt == 4)

# fit the data.
ml.MC1 <- mlogit(choice ~ cost + freq + ovt | income | ivt, MC, reflevel='air')
summary(ml.MC1)
```

We highly recommend users to go through tutorials we prepared to get a better understanding of what the package offers. We present multiple examples, and for each case we specify the utility form.

Reproducibility

The torch-choice package is built upon several dependencies that introduce randomness, you would need to fix random seeds for these packages for reproducibility:

import random
import numpy as np
import torch

SEED = 12345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.use_deterministic_algorithms(True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_choice-1.0.6.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

torch_choice-1.0.6-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file torch_choice-1.0.6.tar.gz.

File metadata

  • Download URL: torch_choice-1.0.6.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for torch_choice-1.0.6.tar.gz
Algorithm Hash digest
SHA256 f5083e5785aa743d6f286edc3796e920b2e52560d6c3a3273efd3cc35d84e881
MD5 a45d2ebf0fdf811bb9fdcf729431191f
BLAKE2b-256 9549a48358754f568bd47df0cf952f004157caa50ca905c097a48e9efb7fce7f

See more details on using hashes here.

File details

Details for the file torch_choice-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: torch_choice-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for torch_choice-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2c1b4c74514971626aafd9b2d57c852fb6b4018065a4547b2bb3ec4e3c83cbb4
MD5 1514862dcdfa2a407915225f4ce2a80e
BLAKE2b-256 b99474b6286531838e610373c903edcffd33b347abe3638cbfe2a33038844a07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page