Skip to main content

A Python package for econometric models driven by search

Project description

SearchLibrium

PyPI version Python License: MIT CI

Automated discrete choice model search powered by Simulated Annealing, Harmony Search, and JAX-accelerated MLE.

SearchLibrium searches over model specifications — which variables to include, whether parameters should be random, which transformations to apply, and which model class to use — and returns the best converged, all-significant model according to your chosen criterion (BIC, AIC, log-likelihood, MAE, or multi-objective combinations).


Install

pip install SearchLibrium --upgrade

Requirements: Python ≥ 3.10, numpy ≥ 2.0, scipy ≥ 1.10, pandas ≥ 2.0, scikit-learn ≥ 1.3.1, statsmodels

Install in Jupyter Notebook

# Run in a notebook cell
import subprocess
import sys

subprocess.check_call([sys.executable, "-m", "pip", "install", "SearchLibrium", "--upgrade"])

# Then import
from SearchLibrium import Parameters, call_siman
print("✓ SearchLibrium installed and ready!")

Quick start

import numpy as np
import pandas as pd
from SearchLibrium import Parameters, call_siman

df = pd.read_csv("https://raw.githubusercontent.com/zahern/HypothesisX/refs/heads/main/data/Swissmetro_final.csv")
varnames   = ["TIME", "COST", "HEADWAY", "SEATS"]
choice_set = np.unique(df["alt"]).tolist()

params = Parameters(
    criterions   = [("bic", -1)],        # minimise BIC
    df           = df,
    varnames     = varnames,
    asvarnames   = varnames,
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["multinomial", "mixed_logit"],
    allow_random = True,
    p_val        = 0.05,
)

best = call_siman(params, init_sol=None, id_num=1)

A run dashboard is printed automatically at the end of every search, showing BIC, log-likelihood, AIC, MAE, variables, model type, and (if multi-objective) the full Pareto archive.


Example Notebooks

Model Notebook
Multinomial Logit — standalone fit + search notebooks/mnl_example.ipynb
Mixed Logit — standalone fit + search notebooks/mixed_logit_example.ipynb
Random Regret Minimisation — standalone fit + search notebooks/rrm_example.ipynb
Mixed Random Regret — standalone fit + search notebooks/mixed_rrm_example.ipynb
Nested Logit — standalone fit + search notebooks/Data_Nest.ipynb
HPC Batch Jobs & PyPI Publishing notebooks/pbs_batch_jobs_guide.ipynb

How the search works

The search uses Simulated Annealing (SA) to explore the space of model specifications:

generate starting solution
  └─ for each SA temperature step
       └─ perturb current specification → guaranteed distinct from current
            ├─ fit model with JAX-accelerated MLE
            ├─ run backward elimination (remove insignificant vars, refit)
            ├─ accept if converged + Metropolis criterion satisfied
            └─ update best solution
print dashboard

Key guarantees:

  • Only converged solutions are accepted
  • Every accepted solution has all variables statistically significant (p < p_val, backward elimination)
  • Each perturbation is guaranteed to produce a genuinely different specification — a distribution-only swap (e.g. normal → lognormal) without any structural change does not count

Data format

Your dataframe must be in long format — one row per alternative per observation:

obs_id alt choice TIME COST ...
1 car 1 35 12 ...
1 train 0 60 8 ...
1 bus 0 55 5 ...
2 car 0 40 14 ...

Model types

Model name Description JAX MLE
"multinomial" Multinomial Logit (MNL)
"mixed_logit" Mixed Logit with simulation-based integration
"random_regret" Random Regret Minimisation (RRM)
"mixed_random_regret" Mixed-RRM with random parameters
"nested_logit" Nested Logit (requires nests= and lambdas= kwargs)
"ordered_logit" Ordered Logit

Search examples by model type

Multinomial Logit

params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    isvarnames = [],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["multinomial"],
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)

Mixed Logit (random parameters)

params = Parameters(
    criterions   = [("bic", -1)],
    df           = df,
    varnames     = ["TIME", "COST", "HEADWAY"],
    asvarnames   = ["TIME", "COST", "HEADWAY"],
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["mixed_logit"],
    allow_random = True,     # enable random parameters
    allow_bcvars = True,     # enable Box-Cox transformations
    n_draws      = 500,      # Halton draws for simulation
    p_val        = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)

Random Regret Minimisation (RRM)

params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    isvarnames = [],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["random_regret"],
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)

Mixed Random Regret (regret + heterogeneity)

params = Parameters(
    criterions   = [("bic", -1)],
    df           = df,
    varnames     = ["TIME", "COST", "HEADWAY"],
    asvarnames   = ["TIME", "COST", "HEADWAY"],
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["mixed_random_regret"],
    allow_random = True,
    n_draws      = 500,
    p_val        = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)

Nested Logit

nests   = {"PublicTransport": [0, 1], "Private": [2, 3]}
lambdas = {"PublicTransport": 0.8, "Private": 1.0}

params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["nested_logit"],
    nests      = nests,
    lambdas    = lambdas,
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)

Multi-objective search (BIC + MAE)

params = Parameters(
    criterions   = [("bic", -1), ("mae", -1)],   # minimise both
    df           = df,
    df_test      = df_test,                        # required for MAE
    varnames     = varnames,
    asvarnames   = varnames,
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    base_alt     = "SM",
    models       = ["multinomial", "mixed_logit"],
    allow_random = True,
)
best = call_siman(params, init_sol=None, id_num=1)
# Returns a Pareto-optimal solution; full archive is printed in the dashboard

Key parameters

Parameter Type Default Description
criterions list of (name, sign) required Objectives: "bic", "aic", "loglik", "mae". Sign: -1 = minimise, +1 = maximise
models list of str all Model classes to search over
allow_random bool False Enable random parameters (required for mixed models)
allow_bcvars bool False Enable Box-Cox variable transformations
allow_corvars bool False Enable correlated random parameters
p_val float 0.05 Significance threshold — variables with p > p_val are eliminated
all_sig bool True Enforce all-significant via backward elimination at each evaluation
n_draws int 1000 Halton draws for mixed model simulation
maxiter int 2000 Maximum MLE iterations per model evaluation

Random parameter distributions

Code Distribution
"n" Normal
"ln" Log-normal
"t" Triangular
"tn" Truncated normal
"u" Uniform

SA control parameters

Pass ctrl=(tI, tF, max_temp_steps, max_iter) to call_siman:

best = call_siman(params, ctrl=(500, 0.001, 100, 20), id_num=1)
Parameter Description
tI Initial temperature — higher = more exploration early on
tF Final temperature — lower = more exploitation at the end
max_temp_steps Number of cooling steps
max_iter Iterations evaluated at each temperature step

Standalone model fitting (no search)

from SearchLibrium import MultinomialLogit, MixedLogit, RandomRegret, MixedRandomRegret

# MNL
mnl = MultinomialLogit()
mnl.setup(X, y, varnames=varnames, alts=alts, ids=ids)
mnl.fit()
mnl.summarise()

# Mixed Logit
mxl = MixedLogit()
mxl.setup(X, y, varnames=varnames, alts=alts, ids=ids, panels=panels,
          randvars={"TIME": "n", "COST": "ln"}, n_draws=500)
mxl.fit()
mxl.summarise()

# RRM
rrm = RandomRegret(df=df, short=False)
rrm.fit()
rrm.report()

# Mixed RRM
mrrm = MixedRandomRegret(df=df)
mrrm.fit()

Interpreting the dashboard

After every call_siman run a dashboard is printed:

╔══════════════════════════════════════════════════════╗
║           SEARCHLIBRIUM — RUN DASHBOARD              ║
╠══════════════════════════════════════════════════════╣
║  Model type   : mixed_logit                          ║
║  Variables    : TIME, COST, HEADWAY                  ║
║  Random params: TIME~n, COST~ln                      ║
╠══════════════════════════════════════════════════════╣
║  Log-likelihood : -312.45                            ║
║  AIC            :  634.90                            ║
║  BIC            :  658.22   ◄ best                   ║
║  MAE            :  0.1843                            ║
╠══════════════════════════════════════════════════════╣
║  Evaluations : 247   Converged : 198   Accepted : 43 ║
╚══════════════════════════════════════════════════════╝
  • Lower BIC / AIC = better fit-complexity tradeoff
  • All retained variables are statistically significant (p < p_val)
  • Random parameters indicate heterogeneity in that attribute's taste
  • RRM models suit contexts where regret-avoidance drives choice behaviour
  • For multi-objective runs the full Pareto archive is shown with one row per non-dominated solution

Bundled datasets

import SearchLibrium as sl
sl.main.preview_dataset()   # prints head of each dataset
Name Description
electricity Stated-preference electricity plan choice
travel_mode Mode choice: air / train / bus / car
swiss_metro Swiss Metro SP study (SM / train / car)

CLI

python -m SearchLibrium --info              # print package guide
python -m SearchLibrium --preview_datasets  # preview bundled datasets
python -m SearchLibrium --test_search       # run MNL/MXL search on travel_mode
python -m SearchLibrium --test_search_nest  # run nested logit search

Search algorithms

Both algorithms share a consistent interface through call_search:

from SearchLibrium import call_search, estimate_ctrl

# Auto-estimate hyperparameters from problem size (recommended)
best = call_search(params)                            # SA by default
best = call_search(params, algorithm='hs')            # Harmony Search

# Manual hyperparameters
best = call_search(params, ctrl=(1000, 0.001, 100, 20))           # SA
best = call_search(params, algorithm='hs',
                   ctrl=(20, 500, 0.9, 0.6, 0.85, 0.3))          # HS

# Inspect auto-estimated ctrl before running
ctrl = estimate_ctrl(params, algorithm='sa')
print(ctrl)

Simulated Annealing (call_siman / algorithm='sa')

Parameter Meaning
tI Initial temperature — higher → more exploration
tF Final temperature — lower → more exploitation
max_temp_steps Number of cooling steps
max_iter Evaluations per cooling step
best = call_siman(params, ctrl=(1000, 0.001, 100, 20), id_num=1)

Harmony Search (call_harmony / algorithm='hs')

Parameter Meaning
max_mem Harmony memory size (population)
maxiter Improvisation iterations
max_harm Max harmony consideration rate
min_harm Min harmony consideration rate
max_pitch Max pitch adjustment rate
min_pitch Min pitch adjustment rate
best = call_harmony(params, ctrl=(20, 400, 0.9, 0.6, 0.85, 0.3), id_num=1)

Auto hyperparameter estimation

If ctrl is omitted, the library estimates appropriate defaults from the problem complexity (n_vars × n_alts × n_models, doubled for random params):

from SearchLibrium import estimate_ctrl
ctrl_sa = estimate_ctrl(params, algorithm='sa')
ctrl_hs = estimate_ctrl(params, algorithm='hs')
print('SA ctrl:', ctrl_sa)
print('HS ctrl:', ctrl_hs)

Complexity buckets:

Complexity SA tI SA steps SA iter/step HS mem HS iters
< 50 500 50 10 10 100
50–200 1 000 100 15 15 300
200–600 2 000 150 20 20 500
> 600 5 000 250 30 25 800

License

MIT — see LICENSE for details.

Citation

If you use SearchLibrium in academic work, please cite the repository:

Ahern, Z. (2025). SearchLibrium: Automated discrete choice model search.
https://github.com/zahern/HypothesisX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchlibrium-0.0.72.tar.gz (197.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searchlibrium-0.0.72-py3-none-any.whl (207.9 kB view details)

Uploaded Python 3

File details

Details for the file searchlibrium-0.0.72.tar.gz.

File metadata

  • Download URL: searchlibrium-0.0.72.tar.gz
  • Upload date:
  • Size: 197.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for searchlibrium-0.0.72.tar.gz
Algorithm Hash digest
SHA256 cbd91ecddafa0fa9b6dd9d4c9d5afb287fa01d7b84bc230a45f5d0d5bca3d5f5
MD5 632b358585c73e28706aed45309a79e3
BLAKE2b-256 b31ec14b1920a16f605132c811b5e4767717ce29dbf0af852bef9abc4c651d71

See more details on using hashes here.

File details

Details for the file searchlibrium-0.0.72-py3-none-any.whl.

File metadata

  • Download URL: searchlibrium-0.0.72-py3-none-any.whl
  • Upload date:
  • Size: 207.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for searchlibrium-0.0.72-py3-none-any.whl
Algorithm Hash digest
SHA256 d8a362640a45909e00f4d8786255d0c9ed1e61209d2775b5ac85b3f9d0a6b2b6
MD5 473a627028c0f7e8318f8d1d61b595a3
BLAKE2b-256 98821bf4bbaecbbd64e84eaf9285315b872e6252061d931b208b9996d78b11f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page