A Python package for econometric models driven by search
Project description
SearchLibrium
Automated discrete choice model search powered by Simulated Annealing, Harmony Search, and JAX-accelerated MLE.
SearchLibrium searches over model specifications — which variables to include, whether parameters should be random, which transformations to apply, and which model class to use — and returns the best converged, all-significant model according to your chosen criterion (BIC, AIC, log-likelihood, MAE, or multi-objective combinations).
Install
pip install SearchLibrium --upgrade
Requirements: Python ≥ 3.10, numpy ≥ 2.0, scipy ≥ 1.10, pandas ≥ 2.0, scikit-learn ≥ 1.3.1, statsmodels
Install in Jupyter Notebook
# Run in a notebook cell
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "SearchLibrium", "--upgrade"])
# Then import
from SearchLibrium import Parameters, call_siman
print("✓ SearchLibrium installed and ready!")
Quick start
import numpy as np
import pandas as pd
from SearchLibrium import Parameters, call_siman
df = pd.read_csv("https://raw.githubusercontent.com/zahern/HypothesisX/refs/heads/main/data/Swissmetro_final.csv")
varnames = ["TIME", "COST", "HEADWAY", "SEATS"]
choice_set = np.unique(df["alt"]).tolist()
params = Parameters(
criterions = [("bic", -1)], # minimise BIC
df = df,
varnames = varnames,
asvarnames = varnames,
isvarnames = [],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
ind_id = df["ID"].values,
base_alt = "SM",
models = ["multinomial", "mixed_logit"],
allow_random = True,
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
A run dashboard is printed automatically at the end of every search, showing BIC, log-likelihood, AIC, MAE, variables, model type, and (if multi-objective) the full Pareto archive.
Example Notebooks
| Model | Notebook |
|---|---|
| Multinomial Logit — standalone fit + search | notebooks/mnl_example.ipynb |
| Mixed Logit — standalone fit + search | notebooks/mixed_logit_example.ipynb |
| Random Regret Minimisation — standalone fit + search | notebooks/rrm_example.ipynb |
| Mixed Random Regret — standalone fit + search | notebooks/mixed_rrm_example.ipynb |
| Nested Logit — standalone fit + search | notebooks/Data_Nest.ipynb |
| HPC Batch Jobs & PyPI Publishing | notebooks/pbs_batch_jobs_guide.ipynb |
How the search works
The search uses Simulated Annealing (SA) to explore the space of model specifications:
generate starting solution
└─ for each SA temperature step
└─ perturb current specification → guaranteed distinct from current
├─ fit model with JAX-accelerated MLE
├─ run backward elimination (remove insignificant vars, refit)
├─ accept if converged + Metropolis criterion satisfied
└─ update best solution
print dashboard
Key guarantees:
- Only converged solutions are accepted
- Every accepted solution has all variables statistically significant (p <
p_val, backward elimination) - Each perturbation is guaranteed to produce a genuinely different specification — a distribution-only swap (e.g. normal → lognormal) without any structural change does not count
Data format
Your dataframe must be in long format — one row per alternative per observation:
| obs_id | alt | choice | TIME | COST | ... |
|---|---|---|---|---|---|
| 1 | car | 1 | 35 | 12 | ... |
| 1 | train | 0 | 60 | 8 | ... |
| 1 | bus | 0 | 55 | 5 | ... |
| 2 | car | 0 | 40 | 14 | ... |
Model types
| Model name | Description | JAX MLE |
|---|---|---|
"multinomial" |
Multinomial Logit (MNL) | ✓ |
"mixed_logit" |
Mixed Logit with simulation-based integration | ✓ |
"random_regret" |
Random Regret Minimisation (RRM) | ✓ |
"mixed_random_regret" |
Mixed-RRM with random parameters | ✓ |
"nested_logit" |
Nested Logit (requires nests= and lambdas= kwargs) |
✓ |
"ordered_logit" |
Ordered Logit | ✓ |
Search examples by model type
Multinomial Logit
params = Parameters(
criterions = [("bic", -1)],
df = df,
varnames = ["TIME", "COST", "HEADWAY"],
asvarnames = ["TIME", "COST", "HEADWAY"],
isvarnames = [],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
base_alt = "SM",
models = ["multinomial"],
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
Mixed Logit (random parameters)
params = Parameters(
criterions = [("bic", -1)],
df = df,
varnames = ["TIME", "COST", "HEADWAY"],
asvarnames = ["TIME", "COST", "HEADWAY"],
isvarnames = [],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
ind_id = df["ID"].values,
base_alt = "SM",
models = ["mixed_logit"],
allow_random = True, # enable random parameters
allow_bcvars = True, # enable Box-Cox transformations
n_draws = 500, # Halton draws for simulation
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
Random Regret Minimisation (RRM)
params = Parameters(
criterions = [("bic", -1)],
df = df,
varnames = ["TIME", "COST", "HEADWAY"],
asvarnames = ["TIME", "COST", "HEADWAY"],
isvarnames = [],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
base_alt = "SM",
models = ["random_regret"],
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
Mixed Random Regret (regret + heterogeneity)
params = Parameters(
criterions = [("bic", -1)],
df = df,
varnames = ["TIME", "COST", "HEADWAY"],
asvarnames = ["TIME", "COST", "HEADWAY"],
isvarnames = [],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
ind_id = df["ID"].values,
base_alt = "SM",
models = ["mixed_random_regret"],
allow_random = True,
n_draws = 500,
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
Nested Logit
nests = {"PublicTransport": [0, 1], "Private": [2, 3]}
lambdas = {"PublicTransport": 0.8, "Private": 1.0}
params = Parameters(
criterions = [("bic", -1)],
df = df,
varnames = ["TIME", "COST", "HEADWAY"],
asvarnames = ["TIME", "COST", "HEADWAY"],
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
base_alt = "SM",
models = ["nested_logit"],
nests = nests,
lambdas = lambdas,
p_val = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
Multi-objective search (BIC + MAE)
params = Parameters(
criterions = [("bic", -1), ("mae", -1)], # minimise both
df = df,
df_test = df_test, # required for MAE
varnames = varnames,
asvarnames = varnames,
choice_set = choice_set,
choices = df["CHOICE"].values,
alt_var = df["alt"].values,
choice_id = df["custom_id"].values,
base_alt = "SM",
models = ["multinomial", "mixed_logit"],
allow_random = True,
)
best = call_siman(params, init_sol=None, id_num=1)
# Returns a Pareto-optimal solution; full archive is printed in the dashboard
Key parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
criterions |
list of (name, sign) |
required | Objectives: "bic", "aic", "loglik", "mae". Sign: -1 = minimise, +1 = maximise |
models |
list of str | all | Model classes to search over |
allow_random |
bool | False |
Enable random parameters (required for mixed models) |
allow_bcvars |
bool | False |
Enable Box-Cox variable transformations |
allow_corvars |
bool | False |
Enable correlated random parameters |
p_val |
float | 0.05 |
Significance threshold — variables with p > p_val are eliminated |
all_sig |
bool | True |
Enforce all-significant via backward elimination at each evaluation |
n_draws |
int | 1000 |
Halton draws for mixed model simulation |
maxiter |
int | 2000 |
Maximum MLE iterations per model evaluation |
Random parameter distributions
| Code | Distribution |
|---|---|
"n" |
Normal |
"ln" |
Log-normal |
"t" |
Triangular |
"tn" |
Truncated normal |
"u" |
Uniform |
SA control parameters
Pass ctrl=(tI, tF, max_temp_steps, max_iter) to call_siman:
best = call_siman(params, ctrl=(500, 0.001, 100, 20), id_num=1)
| Parameter | Description |
|---|---|
tI |
Initial temperature — higher = more exploration early on |
tF |
Final temperature — lower = more exploitation at the end |
max_temp_steps |
Number of cooling steps |
max_iter |
Iterations evaluated at each temperature step |
Standalone model fitting (no search)
from SearchLibrium import MultinomialLogit, MixedLogit, RandomRegret, MixedRandomRegret
# MNL
mnl = MultinomialLogit()
mnl.setup(X, y, varnames=varnames, alts=alts, ids=ids)
mnl.fit()
mnl.summarise()
# Mixed Logit
mxl = MixedLogit()
mxl.setup(X, y, varnames=varnames, alts=alts, ids=ids, panels=panels,
randvars={"TIME": "n", "COST": "ln"}, n_draws=500)
mxl.fit()
mxl.summarise()
# RRM
rrm = RandomRegret(df=df, short=False)
rrm.fit()
rrm.report()
# Mixed RRM
mrrm = MixedRandomRegret(df=df)
mrrm.fit()
Interpreting the dashboard
After every call_siman run a dashboard is printed:
╔══════════════════════════════════════════════════════╗
║ SEARCHLIBRIUM — RUN DASHBOARD ║
╠══════════════════════════════════════════════════════╣
║ Model type : mixed_logit ║
║ Variables : TIME, COST, HEADWAY ║
║ Random params: TIME~n, COST~ln ║
╠══════════════════════════════════════════════════════╣
║ Log-likelihood : -312.45 ║
║ AIC : 634.90 ║
║ BIC : 658.22 ◄ best ║
║ MAE : 0.1843 ║
╠══════════════════════════════════════════════════════╣
║ Evaluations : 247 Converged : 198 Accepted : 43 ║
╚══════════════════════════════════════════════════════╝
- Lower BIC / AIC = better fit-complexity tradeoff
- All retained variables are statistically significant (p <
p_val) - Random parameters indicate heterogeneity in that attribute's taste
- RRM models suit contexts where regret-avoidance drives choice behaviour
- For multi-objective runs the full Pareto archive is shown with one row per non-dominated solution
Bundled datasets
import SearchLibrium as sl
sl.main.preview_dataset() # prints head of each dataset
| Name | Description |
|---|---|
electricity |
Stated-preference electricity plan choice |
travel_mode |
Mode choice: air / train / bus / car |
swiss_metro |
Swiss Metro SP study (SM / train / car) |
CLI
python -m SearchLibrium --info # print package guide
python -m SearchLibrium --preview_datasets # preview bundled datasets
python -m SearchLibrium --test_search # run MNL/MXL search on travel_mode
python -m SearchLibrium --test_search_nest # run nested logit search
Search algorithms
Both algorithms share a consistent interface through call_search:
from SearchLibrium import call_search, estimate_ctrl
# Auto-estimate hyperparameters from problem size (recommended)
best = call_search(params) # SA by default
best = call_search(params, algorithm='hs') # Harmony Search
# Manual hyperparameters
best = call_search(params, ctrl=(1000, 0.001, 100, 20)) # SA
best = call_search(params, algorithm='hs',
ctrl=(20, 500, 0.9, 0.6, 0.85, 0.3)) # HS
# Inspect auto-estimated ctrl before running
ctrl = estimate_ctrl(params, algorithm='sa')
print(ctrl)
Simulated Annealing (call_siman / algorithm='sa')
| Parameter | Meaning |
|---|---|
tI |
Initial temperature — higher → more exploration |
tF |
Final temperature — lower → more exploitation |
max_temp_steps |
Number of cooling steps |
max_iter |
Evaluations per cooling step |
best = call_siman(params, ctrl=(1000, 0.001, 100, 20), id_num=1)
Harmony Search (call_harmony / algorithm='hs')
| Parameter | Meaning |
|---|---|
max_mem |
Harmony memory size (population) |
maxiter |
Improvisation iterations |
max_harm |
Max harmony consideration rate |
min_harm |
Min harmony consideration rate |
max_pitch |
Max pitch adjustment rate |
min_pitch |
Min pitch adjustment rate |
best = call_harmony(params, ctrl=(20, 400, 0.9, 0.6, 0.85, 0.3), id_num=1)
Auto hyperparameter estimation
If ctrl is omitted, the library estimates appropriate defaults from the
problem complexity (n_vars × n_alts × n_models, doubled for random params):
from SearchLibrium import estimate_ctrl
ctrl_sa = estimate_ctrl(params, algorithm='sa')
ctrl_hs = estimate_ctrl(params, algorithm='hs')
print('SA ctrl:', ctrl_sa)
print('HS ctrl:', ctrl_hs)
Complexity buckets:
| Complexity | SA tI | SA steps | SA iter/step | HS mem | HS iters |
|---|---|---|---|---|---|
| < 50 | 500 | 50 | 10 | 10 | 100 |
| 50–200 | 1 000 | 100 | 15 | 15 | 300 |
| 200–600 | 2 000 | 150 | 20 | 20 | 500 |
| > 600 | 5 000 | 250 | 30 | 25 | 800 |
License
MIT — see LICENSE for details.
Citation
If you use SearchLibrium in academic work, please cite the repository:
Ahern, Z. (2025). SearchLibrium: Automated discrete choice model search.
https://github.com/zahern/HypothesisX
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file searchlibrium-0.0.72.tar.gz.
File metadata
- Download URL: searchlibrium-0.0.72.tar.gz
- Upload date:
- Size: 197.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbd91ecddafa0fa9b6dd9d4c9d5afb287fa01d7b84bc230a45f5d0d5bca3d5f5
|
|
| MD5 |
632b358585c73e28706aed45309a79e3
|
|
| BLAKE2b-256 |
b31ec14b1920a16f605132c811b5e4767717ce29dbf0af852bef9abc4c651d71
|
File details
Details for the file searchlibrium-0.0.72-py3-none-any.whl.
File metadata
- Download URL: searchlibrium-0.0.72-py3-none-any.whl
- Upload date:
- Size: 207.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8a362640a45909e00f4d8786255d0c9ed1e61209d2775b5ac85b3f9d0a6b2b6
|
|
| MD5 |
473a627028c0f7e8318f8d1d61b595a3
|
|
| BLAKE2b-256 |
98821bf4bbaecbbd64e84eaf9285315b872e6252061d931b208b9996d78b11f3
|