Large-scale choice modeling through the lens of machine learning.
Project description
Choice-Learn is a Python package designed to help you estimate discrete choice models and use them (e.g., assortment optimization plug-in). The package provides ready-to-use datasets and models from the litterature. It also provides a lower level use if you wish to customize any choice model or create your own from scratch. Choice-Learn efficiently handles data with the objective to limit RAM usage. It is made particularly easy to estimate choice models with your own, large datasets.
Choice-Learn uses NumPy and pandas as data backend engines and TensorFlow for models.
Table of Contents
- Introduction - Discrete Choice Modelling
- What's in there ?
- Getting Started
- Installation
- Usage
- Documentation
- Contributing
- Citation
Introduction - Discrete Choice Modelling
Discrete choice models aim at explaining or predicting choices over a set of alternatives. Well known use-cases include analyzing people's choice of mean of transport or products purchases in stores.
If you are new to choice modelling, you can check this resource. The different notebooks from the Getting Started section can also help you understand choice modelling and more importantly help you for your usecase.
What's in there ?
Data
- Generic dataset handling with the ChoiceDataset class [Example]
- Ready-To-Use datasets:
- SwissMetro [2]
- ModeCanada [3]
- The Train dataset [5]
- The Heating, HC & Electricity datasets from Kenneth Train described here, here and here
- Stated car preferences [9]
- The TaFeng dataset from Kaggle
- The ICDM-2013 Expedia dataset from Kaggle [6]
Model estimation
- Ready-to-use models:
- Custom modelling is made easy by subclassing the ChoiceModel class [Example]
Auxiliary tools
Getting Started
You can find the following tutorials to help you getting started with the package:
- Generic and simple introduction [notebook][doc]
- Detailed explanations of data handling depending on the data format [noteboook][doc]
- A detailed example of conditional logit estimation [notebook][doc]
- Introduction to custom modelling and more complex parametrization [notebook][doc]
Installation
User installation
To install the required packages in a virtual environment, run the following command:
** pip-install not possible yet, to come soon**
pip install choice-learn
In the mean time you can clone the repository:
git clone git@github.com:artefactory/choice-learn.git
Dependencies
Choice-Learn requires the following:
- Python (>=3.8)
- NumPy (>=1.24)
- pandas (>=1.5)
For modelling you need:
- TensorFlow (>=2.13)
:warning: Warning: If you are a MAC user with a M1 or M2 chip, importing TensorFlow might lead to Python crashing. In such case, use anaconda to install TensorFlow with
conda install -c apple tensorflow
.
An optional requirement used for coefficients analysis and L-BFGS optimization is:
- TensorFlow Probability (>=0.20.1)
Finally for pricing or assortment optimization, you need either Gurobi or OR-Tools:
- gurobipy (>=11.0.0)
- ortools (>=9.6.2534)
Once you have created your conda/pip python==3.9 environment, you can install requirements by:
pip install choice-learn
Usage
Here is a short example of model parametrization to estimate a Conditional Logit on the SwissMetro dataset.
from choice_learn.data import ChoiceDataset
from choice_learn.models import ConditionalLogit, RUMnet
# Instantiation of a ChoiceDataset from a pandas.DataFrame
# Onl need to specify how the file is encoded:
dataset = ChoiceDataset.from_single_long_df(df=transport_df,
items_id_column="alt",
choices_id_column="case",
choices_column="choice",
shared_features_columns=["income"],
items_features_columns=["cost", "freq", "ovt", "ivt"],
choice_format="item_id")
# Initialization of the model
model = ConditionalLogit()
# Creation of the different weights:
# add_coefficients adds one coefficient for each specified item_index
# intercept, and income are added for each item except the first one that needs to be zeroed
model.add_coefficients(feature_name="intercept",
items_indexes=[1, 2, 3])
model.add_coefficients(feature_name="income",
items_indexes=[1, 2, 3])
model.add_coefficients(feature_name="ivt",
items_indexes=[0, 1, 2, 3])
# shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
# Here, cost, freq and ovt coefficients are shared between all items
model.add_shared_coefficient(feature_name="cost",
items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(feature_name="freq",
items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(feature_name="ovt",
items_indexes=[0, 1, 2, 3])
history = model.fit(dataset, get_report=True)
print("The average neg-loglikelihood is:", model.evaluate(dataset).numpy())
print(model.report)
Documentation
A detailed documentation of this project is available here.
TensorFlow also has extensive documentation that can help you.
Contributing
You are welcome to contribute to the project ! You can help in various ways:
- raise issues
- resolve issues already opened
- develop new features
- provide additional examples of use
- fix typos, improve code quality
- develop new tests
We recommend to open an issue to discuss your ideas. More details are given here.
Citation
If you consider this package and any of its feature useful for your research, please cite us.
License
The use of this software is under the MIT license, with no limitation of usage, including for commercial applications.
Contributors
Special Thanks
Affiliations
Choice-Learn has been developed through a collaboration between the Artefact Research Center and the laboratory MICS from CentraleSupélec, Université Paris Saclay.
References
Papers
[1]Representing Random Utility Choice Models with Neural Networks, Aouad, A.; Désir, A. (2022)
[2]The Acceptance of Model Innovation: The Case of Swissmetro, Bierlaire, M.; Axhausen, K., W.; Abay, G. (2001)
[3]Applications and Interpretation of Nested Logit Models of Intercity Mode Choice, Forinash, C., V.; Koppelman, F., S. (1993)
[4]The Demand for Local Telephone Service: A Fully Discrete Model of Residential Calling Patterns and Service Choices, Train K., E.; McFadden, D., L.; Moshe, B. (1987)
[5] Estimation of Travel Choice Models with Randomly Distributed Values of Time, Ben-Akiva, M.; Bolduc, D.; Bradley, M. (1993)
[6] Personalize Expedia Hotel Searches - ICDM 2013, Ben Hamner, A.; Friedman, D.; SSA_Expedia. (2013)
[7] A Neural-embedded Discrete Choice Model: Learning Taste Representation with Strengthened Interpretability, Han, Y.; Calara Oereuran F.; Ben-Akiva, M.; Zegras, C. (2020)
[8] A branch-and-cut algorithm for the latent-class logit assortment problem, Méndez-Díaz, I.; Miranda-Bront, J. J.; Vulcano, G.; Zabala, P. (2014)
[9] Stated Preferences for Car Choice in Mixed MNL models for discrete response., McFadden, D. and Kenneth Train (2000)
[10] Modeling the Choice of Residential Location, McFadden, D. (1978)
Code and Repositories
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for choice_learn-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78b9d1da88a2ff9f67cd7c7957edffde185e68638fc4e7776d41ae068bec16ae |
|
MD5 | d3fc6fc0f8c08383ee8e7b3eb3ea68ae |
|
BLAKE2b-256 | 5091394d7e4f636b6f8ce913adea521deaf10ea766e1a059f57fb284b0c9ec4e |