Skip to main content

Python Tensor based package for discrete choice modelling.

Project description

PyCMTensor

Licence PyPI version Documentation Status codecov Downloads

Tests CodeQL Publish DOI

PyCMTensor is a discrete choice modelling development tool on deep learning libraries, enabling development of complex models using deep neural networks. PyCMTensor is build with Aesara package, similar to Tensorflow or Keras. Aesara is used the backend library because of its hackable, open-source nature. Users of Biogeme would be familiar with the syntax of PyCMTensor. PyCMTensor improves on Biogeme in situations where much more complex models are necessary, for example, integrating neural networks into discrete choice models. PyCMTensor also include the ability to estimate models using stochastic gradient descent methods by default, e.g. Nesterov Accelerated Gradient (NAG), Adaptive momentum (ADAM), or RMSProp.

Table of contents

Features

  • Estimate complex choice models using deep learning methods
  • Combines traditional econometric models (Multinomial Logit) with neural networks
  • Similar programming syntax as Biogeme
  • Uses tensor features found in the Aesara library

Quick start

Installation

  1. Download and install Miniconda

    Full Anaconda works fine, but Miniconda is recommmended for a minimal installation. Ensure that Conda is using at least Python 3.9

  2. Install conda dependencies:

    Dependencies are OS specific

    Windows

    conda install mkl-service conda-forge::cxx-compiler conda-forge::m2w64-toolchain
    

    Linux

    conda install mkl-service conda-forge::cxx-compiler
    

    Mac OSX

    conda install mkl-service Clang
    

    **Optional**

    Alternatively, conda environment.yml files are provided in the environment/ in respective operating systems, for example in Windows:

    conda env create -f environment/environment_windows.yml
    conda activate pycmtensor-dev
    
  3. Install the PyCMTensor package from PyPI via pip

    pip install -U pycmtensor==1.3.2
    

    Alternatively, the latest development version is available via Github. It can be installed via

    pip install -U git+https://github.com/mwong009/pycmtensor.git
    

For more information about installing, see Installation.

Usage

PyCMTensor uses syntax very similar to Biogeme. Users of Biogeme should be familiar with the syntax. Make sure you are using the correct Conda environment and/or the required packages are installed.

Simple example: MNL model using the Swissmetro dataset

  1. Start an interactive session (e.g. IPython or Jupyter Notebook) and import the PyCMTensor package and pandas:

    import pycmtensor as cmt
    import pandas as pd
    

    Include the additional submodules:

    from pycmtensor.expressions import Beta # Beta class for model parameters
    from pycmtensor.models import MNL  # MNL model
    from pycmtensor.statistics import elasticities  # For calculating elasticities
    

    For a full list of submodules and description, refer to API Reference. Using the swissmetro dataset, we define a simple MNL model.

:warning: Note: The following is a replication of the results from Biogeme using the Adam optimization method with constant learning rate.

  1. Import the dataset and perform some data cleaning

    swissmetro = pd.read_csv("swissmetro.dat", sep="\t")
    db = cmt.Data(
        df=swissmetro,
        choice="CHOICE",
        drop=[swissmetro["CHOICE"] == 0],
        autoscale=True,
        autoscale_except=["ID", "ORIGIN", "DEST", "CHOICE"],
        split=0.8,
    )
    
  2. Initialize the model parameters and specify the utility functions and availability conditions

    b_cost = Beta("b_cost", 0.0, None, None, 0)
    b_time = Beta("b_time", 0.0, None, None, 0)
    asc_train = Beta("asc_train", 0.0, None, None, 0)
    asc_car = Beta("asc_car", 0.0, None, None, 0)
    asc_sm = Beta("asc_sm", 0.0, None, None, 1)
    
    U_1 = b_cost * db["TRAIN_CO"] + b_time * db["TRAIN_TT"] + asc_train
    U_2 = b_cost * db["SM_CO"] + b_time * db["SM_TT"] + asc_sm
    U_3 = b_cost * db["CAR_CO"] + b_time * db["CAR_TT"] + asc_car
    
    # specify the utility function and the availability conditions
    U = [U_1, U_2, U_3]  # utility
    AV = [db["TRAIN_AV"], db["SM_AV"], db["CAR_AV"]]  # availability
    
  3. Define the Multinomial Logit model

    mymodel = MNL(db, locals(), U, AV)
    
  4. Train the model and generate model statistics (Optionally, you can also set the training hyperparameters)

    mymodel.train(db, max_steps=200, batch_size=128)  # run the model training on the dataset `db`
    

Results

The following model functions outputs the statistics, results of the model, and model training

  1. Model estimates

    print(mymodel.results.beta_statistics())
    

    Output:

                  value   std err     t-test   p-value rob. std err rob. t-test rob. p-value
    asc_car   -0.665638  0.044783 -14.863615       0.0     0.176178    -3.77821     0.000158
    asc_sm          0.0         -          -         -            -           -            -
    asc_train -1.646826  0.048099 -34.238218       0.0     0.198978   -8.276443          0.0
    b_cost     0.024912   0.01943   1.282135  0.199795     0.016413    1.517851     0.129052
    b_time    -0.313186  0.049708  -6.300485       0.0     0.208239   -1.503979     0.132587
    
  2. Training results

    print(mymodel.results.model_statistics())
    

    Output:

                                              value
    Number of training samples used          8575.0
    Number of validation samples used        2143.0
    Init. log likelihood               -8874.438875
    Final log likelihood                -7513.22967
    Accuracy                                 59.26%
    Likelihood ratio test                2722.41841
    Rho square                             0.153385
    Rho square bar                         0.152822
    Akaike Information Criterion       15036.459339
    Bayesian Information Criterion      15071.74237
    Final gradient norm                    0.007164
    
  3. Correlation matrix

    print(mymodel.results.model_correlation_matrix())
    

    Output:

                 b_cost    b_time  asc_train   asc_car
    b_cost     1.000000  0.209979   0.226737 -0.028335
    b_time     0.209979  1.000000   0.731378  0.796144
    asc_train  0.226737  0.731378   1.000000  0.664478
    asc_car   -0.028335  0.796144   0.664478  1.000000
    
  4. Elasticities

    print(elasticities(mymodel, db, 0, "TRAIN_TT"))  # CHOICE:TRAIN (0) wrt TRAIN_TT
    

    Output:

    [-0.06813523 -0.01457346 -0.0555597  ... -0.03453162 -0.02809382 -0.02343637]
    
  5. Choice probability predictions

    print(mymodel.predict(db, return_choices=False))
    

    Output:

    [[0.12319342 0.54372904 0.33307754]
    [0.12267997 0.54499504 0.33232499]
    [0.12354587 0.54162143 0.3348327 ]
    ...
    [0.12801816 0.5201341  0.35184774]
    [0.1271984  0.51681635 0.35598525]
    [0.12881032 0.51856181 0.35262787]]
    

Development

To develop PyCMTensor development package in a local environment, e.g. to modify, add features etc., you need to set up a virtual (Conda) environment and install the project requirements. Follow the instructions to install Conda (miniconda), then start a new virtual environment with the provided environment/environment_<your OS>.yml file.

  1. Download the git project repository into a local directory
    git clone git://github.com/mwong009/pycmtensor
    cd pycmtensor
    

Installing the virtual environment

Windows

conda env create -f environment/environment_windows.yml

Linux

conda env create -f environment/environment_linux.yml

Mac OSX

conda env create -f environment/environment_macos.yml

Next, activate the virtual environment and install poetry dependency manager via pip

conda activate pycmtensor-dev
pip install poetry

Install the project and development dependencies

poetry install -E dev

Citation

Cite this software as:

@software{melvin_wong_2022_7249280,
  author       = {Melvin Wong},
  title        = {mwong009/pycmtensor: v1.3.2},
  year         = 2022,
  version      = {v1.3.2},
  doi          = {10.5281/zenodo.7249280},
  url          = {https://doi.org/10.5281/zenodo.7249280}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycmtensor-1.3.2.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

pycmtensor-1.3.2-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file pycmtensor-1.3.2.tar.gz.

File metadata

  • Download URL: pycmtensor-1.3.2.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1040-azure

File hashes

Hashes for pycmtensor-1.3.2.tar.gz
Algorithm Hash digest
SHA256 4af3d902d1cf3e64bfbb6b7660c727f7a385d4139909e93dcd8841fbbc81c437
MD5 4ecf848cef2a8a147895c53239db8a34
BLAKE2b-256 0aeb5d463c778c34b22af0d0412ab1d9b63b1dff6b7dcb8f7c5621e2d5a6bac6

See more details on using hashes here.

Provenance

File details

Details for the file pycmtensor-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: pycmtensor-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1040-azure

File hashes

Hashes for pycmtensor-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 485e0bb203bfe31ff9ef0b48de0634386d9cf6fc525193a1a3acff7f27fc5d91
MD5 77d66395886aa82acb440d9c10de5c1f
BLAKE2b-256 7f35e26faae5cbc663dc4e7155f498c74796f9408b8d2f9f6b3c4fb38983c824

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page