Skip to main content

JAX-first hierarchical search and fitting for count, CMF, duration, and linear models.

Project description

metacountregressor

A JAX-first Python package for hierarchical model fitting and metaheuristic-driven model structure search. Supports count, CMF, duration, and linear models with random parameters, latent classes, zero-inflation, and heterogeneity in means — all with one unified API.


Table of Contents


Install

pip install metacountregressor
pip install jax jaxlib jaxopt   # JAX backend

Quick import check:

python -c "from metacountregressor import __version__, load_example16_3_raw_data; print(__version__, load_example16_3_raw_data().shape)"

Notebooks — Start Here

The fastest way to learn the package is to open the notebooks in order. Each one builds on the previous and uses the bundled Example 16-3 crash-frequency data so you can run everything without sourcing your own dataset.

# Notebook What you learn
00 00_quickstart.ipynb Install, load data, first search run in under 10 minutes
01 01_crash_frequency_search.ipynb Mixed Negative Binomial search — constraints, roles, re-fit
02 02_latent_class_fc_validation.ipynb 2-class LC model — fit, extract class probabilities, validate against FC
03 03_cmf_aadt_search.ipynb CMF model — baseline + AADT-interaction structure search
04 04_linear_speed_prediction.ipynb Gaussian linear model search (speed prediction)
05 05_batch_script_tutorial.ipynb Batch scripts, parallel runs, PBS/SLURM HPC job templates

Tip: open with jupyter lab metacountregressor/metacountregressor/templates/ to browse all notebooks together.


Quick Start

import numpy as np
from metacountregressor import (
    ExperimentBuilder,
    ModelConstraints,
    SearchOutputConfig,
    load_example16_3_model_data,
    get_help,
)

# ── 1. Load the bundled crash-frequency dataset ──────────────────────────────
df = load_example16_3_model_data()
exposure = df['LENGTH'] * df['AADT'] * 365 / 1e8
df['OFFSET'] = np.log(exposure.clip(lower=1e-9))

# ── 2. Build constraints ──────────────────────────────────────────────────────
c = (
    ModelConstraints()
    .force_include('OFFSET')
    .no_zi('LENGTH', 'CURVES', 'WIDTH', 'SLOPE')
    .no_random('URB')
    .allow_random('CURVES', distributions=['lognormal'])
)

# ── 3. Create the experiment ──────────────────────────────────────────────────
builder = ExperimentBuilder(df, id_col='ID', y_col='FREQ', offset_col='OFFSET')
builder.describe()              # print data summary
get_help('crash_frequency')     # print end-to-end workflow guide

# ── 4. Build the structure evaluator ─────────────────────────────────────────
evaluator = builder.build_evaluator(
    variables=['AADT', 'LENGTH', 'SPEED', 'CURVES', 'URB', 'AVEPRE'],
    constraints=c,
    default_roles=[0, 1, 2, 3, 5],
    max_latent_classes=1,
    R=200,
)

# ── 5. Run the search ─────────────────────────────────────────────────────────
result = builder.run(
    evaluator,
    algo='sa',          # 'sa' | 'de' | 'hs'
    max_iter=1000,
    seed=42,
    output_config=SearchOutputConfig(output_dir='results', experiment_name='demo'),
)
print('Best BIC:', result.best_score)
print('Saved to:', result.saved_to)

# ── 6. Re-fit with more draws ─────────────────────────────────────────────────
fit = builder.fit_manual_model(manual_spec=result.best_spec, model='nb', R=500)
print(fit)

What the package does

metacountregressor solves two related problems:

  1. Structure search — automatically discover which variables to include, whether each coefficient should be fixed or random, and whether the model needs latent classes, zero-inflation, or heterogeneity in means. The search is driven by metaheuristic algorithms (SA, DE, HS) that minimise BIC.

  2. Model estimation — fit the discovered (or manually specified) model structure using JAX-accelerated simulation-based maximum likelihood with Halton draws.

The same API handles crash-frequency count models, CMF (Crash Modification Factor) models, duration models, and linear (Gaussian) models.


Data loaders

All loaders return a pandas.DataFrame.

from metacountregressor import (
    load_example16_3_raw_data,      # Example 16-3: original 31 columns
    load_example16_3_model_data,    # + OFFSET, FC_ENCODED, FC_LABEL
    load_example_crash_data,        # alias for load_example16_3_model_data
    load_example_duration_data,     # synthetic duration target from Ex 16-3
    load_example_linear_data,       # synthetic linear target from Ex 16-3
    load_example_platform_speed_data,           # speed relative to platform
    load_example_platform_gap_duration_data,    # time until next speeding event
    load_example_panel_data,        # panel-structure example
)

Example 16-3 columns

load_example16_3_raw_data() returns the original source columns:

Group Columns
Identifiers ID
Outcome FREQ
Geometry LENGTH, WIDTH, INCLANES, DECLANES, MEDWIDTH, MIMEDSH, MXMEDSH
Speed / grade SPEED, MIGRADE, MXGRADE, MXGRDIFF, SLOPE
Traffic AADT, SINGLE, DOUBLE, TRAIN, PEAKHR, ADTLANE
Road class URB, FC, ACCESS, TANGENT, CURVES, MINRAD, GRADEBR
Friction / weather FRICTION, INTECHAG, AVEPRE, AVESNOW

load_example16_3_model_data() adds OFFSET, FC_ENCODED, FC_LABEL.


ExperimentBuilder API

from metacountregressor import ExperimentBuilder

builder = ExperimentBuilder(
    df=df,
    id_col='ID',           # required — observation identifier
    y_col='FREQ',          # required — outcome variable
    offset_col='OFFSET',   # optional — log-exposure offset (count models)
    group_id_col='FC',     # optional — group/panel identifier
)

Key methods

Method Purpose
builder.describe() Print data summary: N, outcome stats, variable types
builder.suggest_config(max_latent_classes=2) Print recommended ExperimentBuilder settings
builder.build_evaluator(...) Build a structure evaluator (see below)
builder.build_count_evaluator(...) Shortcut for count models
builder.run(evaluator, algo, max_iter, seed, ...) Run metaheuristic search
builder.run_search(evaluator, ...) Alias for run()
builder.make_manual_spec(...) Build a model spec dict manually
builder.fit_manual_model(manual_spec, model, R) Fit a manually specified structure
builder.compute_latent_class_probabilities(fit, true_class_col) Get class membership probabilities
ExperimentBuilder.get_family_capabilities() Static: list supported model families
ExperimentBuilder.get_search_argument_guide() Static: full argument documentation

build_evaluator arguments

evaluator = builder.build_evaluator(
    variables=['AADT', 'LENGTH', 'SPEED', 'CURVES'],   # candidate columns
    constraints=c,                                       # ModelConstraints object
    model_family='count',          # 'count' | 'cmf' | 'duration' | 'linear'
    default_roles=[0, 1, 2, 3, 5], # roles the search may assign
    max_latent_classes=2,          # 1 = standard, 2 = allow LC
    mode='single',                 # 'single' = minimise BIC
    R=200,                         # Halton simulation draws
    # CMF-only arguments:
    aadt_col='AADT',
    baseline_vars=['URB', 'ACCESS'],
    local_vars=['CURVES', 'WIDTH'],
    # Duration-only:
    budget_col='AADT',
)

ModelConstraints API

ModelConstraints restricts which roles and distributions each variable may take. All methods return self for chaining.

from metacountregressor import ModelConstraints

c = (
    ModelConstraints()
    .force_include('OFFSET')                          # cannot be excluded
    .force_fixed('AADT')                              # only fixed or excluded
    .no_zi('LENGTH', 'CURVES', 'SLOPE', 'WIDTH')      # cannot be ZI term
    .no_random('URB', 'GRADEBR')                      # no random parameter
    .allow_random('CURVES', distributions=['lognormal'])  # restrict distribution
    .membership_only('FC_ENCODED')                    # drives class prob only
    .allow_membership('SPEED')                        # may also enter membership
    .outcome_only('AADT')                             # no membership role
    .exclude('YEAR', 'ID')                            # removed from search
    .set_roles('WIDTH', [0, 1, 2])                    # low-level override
)

print(c)           # display all constraints
c.summary()        # same as print(c)

Get detailed API documentation:

from metacountregressor import get_help
get_help('constraints')

Role codes

Code Name Description
0 Excluded Variable not in the model
1 Fixed Same coefficient for every observation
2 Random (ind.) Individual random effect, independent draws
3 Random (corr.) Individual random effect, correlated with others
4 Grouped Group-level random effect (shared within group)
5 Heterogeneity Explains variation in random-parameter means
6 Zero Inflation Enters the zero-inflation probability equation
7 Membership only Drives latent-class probability — not the outcome
8 Membership + Fixed Drives class membership AND has class-specific outcome effect

Random-parameter distributions: normal, lognormal, triangular, uniform.

get_help('roles')   # full reference with examples

Search algorithms

Alias Algorithm Best for
'sa' Simulated Annealing Robust default — escapes local minima via cooling schedule
'de' Differential Evolution Thorough population-based search — use when SA converges early
'hs' Harmony Search Fast initial convergence — good for a quick first pass
# Run the same evaluator with different algorithms
result_sa = builder.run(evaluator, algo='sa', max_iter=2000, seed=42)
result_de = builder.run(evaluator, algo='de', max_iter=2000, seed=42)
result_hs = builder.run(evaluator, algo='hs', max_iter=2000, seed=42)
get_help('metaheuristics')   # full parameter reference

Model families

Count models (Poisson / Negative Binomial)

evaluator = builder.build_count_evaluator(
    variables=['AADT', 'LENGTH', 'SPEED', 'CURVES', 'URB', 'AVEPRE'],
    constraints=c,
    default_roles=[0, 1, 2, 3, 5],
    max_latent_classes=1,
    R=200,
)
result = builder.run(evaluator, algo='sa', max_iter=2000, seed=42)
fit = builder.fit_manual_model(manual_spec=result.best_spec, model='nb', R=500)

Manual spec:

spec = builder.make_manual_spec(
    fixed_terms=['AADT', 'LENGTH', 'SPEED'],
    rdm_terms=['CURVES:normal'],
    rdm_cor_terms=['TANGENT:normal', 'SLOPE:lognormal'],
    hetro_in_means=['AVEPRE'],
    zi_terms=['ACCESS'],
    membership_terms=['URB'],
    dispersion=1,
    latent_classes=2,
)
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)

CMF models

from metacountregressor import CMFExperimentBuilder

cmf = CMFExperimentBuilder(
    df=df,
    y_col='FREQ',
    aadt_col='AADT',
    baseline_vars=['URB', 'ACCESS', 'GRADEBR', 'CURVES'],
    local_vars=['CURVES', 'WIDTH'],
)

# Route A: full JAX flexibility (random params, LC, ZI)
builder_jax, evaluator_jax, meta = cmf.build_jax_count_evaluator(
    id_col='ID', offset_col='OFFSET', constraints=c, max_latent_classes=1, R=200)
result = builder_jax.run(evaluator_jax, algo='sa', max_iter=500, seed=42)

# Route B: classic GA search (fast, two-component structure)
search = cmf.run_search(R=200)
fit = cmf.fit_best_model(search, final_R=500)
cmf.print_report(search, fit)
get_help('cmf')   # full workflow guide

Duration models

from metacountregressor import load_example_duration_data

duration_df = load_example_duration_data()
duration_builder = ExperimentBuilder(
    df=duration_df, id_col='ID', y_col='DURATION', group_id_col='FC')

evaluator = duration_builder.build_evaluator(
    variables=['WIDTH', 'CURVES', 'SLOPE', 'URB', 'FC_ENCODED'],
    model_family='duration',
    default_roles=[0, 1, 2, 3],
    max_latent_classes=1, R=200,
)
result = duration_builder.run(evaluator, algo='sa', max_iter=500, seed=42)
fit = duration_builder.fit_manual_model(manual_spec=result.best_spec,
                                        model='lognormal', R=500)

Linear models

from metacountregressor import load_example_platform_speed_data

speed_df = load_example_platform_speed_data()
speed_builder = ExperimentBuilder(
    df=speed_df, id_col='PLATFORM_ID', y_col='SPEED', offset_col=None)

evaluator = speed_builder.build_evaluator(
    variables=['DIST_TO_PLATFORM', 'POSTED_SPEED', 'APPROACH_ACCEL',
               'PLATFORM_HEIGHT', 'PLATFORM_WIDTH'],
    model_family='linear',
    default_roles=[0, 1, 2, 3],   # no ZI for linear
    max_latent_classes=1, R=200,
)
result = speed_builder.run(evaluator, algo='sa', max_iter=500, seed=42)
fit = speed_builder.fit_manual_model(manual_spec=result.best_spec,
                                     model='gaussian', R=500)

Latent class models

# 1. Constrain FC_ENCODED to drive class membership only
c = (
    ModelConstraints()
    .membership_only('FC_ENCODED')
    .force_include('OFFSET')
    .no_zi('LENGTH', 'CURVES', 'WIDTH', 'SLOPE')
    .no_random('URB', 'GRADEBR')
)

# 2. Build LC evaluator (max_latent_classes=2, include roles 7 & 8)
evaluator = builder.build_evaluator(
    variables=['URB', 'ACCESS', 'GRADEBR', 'CURVES', 'LENGTH',
               'SPEED', 'WIDTH', 'SLOPE', 'AVEPRE', 'FC_ENCODED'],
    constraints=c,
    default_roles=[0, 1, 2, 3, 5, 7, 8],
    max_latent_classes=2,
    R=150,
)

# 3. Run search
result = builder.run(evaluator, algo='sa', max_iter=500, seed=1)

# 4. Manually fit a specific structure
spec = builder.make_manual_spec(
    fixed_terms=['AADT', 'SPEED', 'LENGTH'],
    rdm_cor_terms=['CURVES:normal', 'SLOPE:normal'],
    hetro_in_means=['AVEPRE'],
    membership_terms=['URB', 'ACCESS', 'GRADEBR'],
    dispersion=1, latent_classes=2,
)
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)

# 5. Extract class membership probabilities
class_probs = builder.compute_latent_class_probabilities(
    fit, true_class_col='FC_ENCODED')
print(class_probs.head())

# 6. Compare predicted class vs actual FC
class_probs['predicted'] = (
    class_probs[['class_1_prob', 'class_2_prob']].to_numpy().argmax(axis=1))
agreement = (class_probs['predicted'] == class_probs['FC_ENCODED']).mean()
print(f'Agreement with FC: {agreement:.1%}')

Pre-specified reference model:

from metacountregressor import (
    load_book_latent_class_spec, describe_book_latent_class_spec)

describe_book_latent_class_spec()
spec = load_book_latent_class_spec()
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)
get_help('latent_class')   # full workflow guide

Output and saving results

from metacountregressor import SearchOutputConfig

output_config = SearchOutputConfig(
    output_dir='results',
    experiment_name='example16_3_count',
    search_description='NB count model search on Example 16-3',
    save_json=True,
)

result = builder.run(evaluator, algo='sa', max_iter=2000,
                     output_config=output_config)
print('Saved to:', result.saved_to)

Each saved JSON contains: experiment name, description, model family, algorithm, best BIC, and the best structural specification.

Collect results from multiple runs:

import json, pathlib

results = sorted(
    [json.load(open(f)) for f in pathlib.Path('results').glob('*.json')],
    key=lambda r: r.get('best_score', float('inf'))
)
print('Best BIC:', results[0]['best_score'])
print('Algorithm:', results[0]['algorithm'])

Help system

The package includes a built-in interactive help system:

from metacountregressor import get_help

get_help()                    # list all topics
get_help('roles')             # role code table + distributions
get_help('constraints')       # ModelConstraints API
get_help('metaheuristics')    # algorithm comparison and parameters
get_help('crash_frequency')   # count model workflow
get_help('latent_class')      # latent class workflow
get_help('cmf')               # CMF workflow
get_help('linear')            # linear model workflow
get_help('duration')          # duration model workflow
get_help('batch')             # batch script and HPC guide

Running on HPC clusters

Automatic walltime detection

On PBS/Torque or SLURM, the package reads the scheduler walltime automatically and uses it as a max_time limit — the search stops cleanly before the job is killed.

Scheduler Environment variable Format
PBS/Torque PBS_WALLTIME HH:MM:SS
SLURM SLURM_TIME_LIMIT seconds or HH:MM:SS

Set manually for local testing:

result = builder.run(evaluator, algo='sa', max_iter=99999, max_time=3600)

PBS job script

#!/bin/bash
#PBS -N metacount_sa
#PBS -l nodes=1:ppn=4
#PBS -l walltime=04:00:00
#PBS -l mem=16gb
#PBS -j oe
#PBS -o logs/sa_seed42.log

module load python/3.11
cd $PBS_O_WORKDIR
source venv/bin/activate

# Walltime auto-detected from PBS_WALLTIME
python run_experiment.py sa 42 200 99999

SLURM job array

#!/bin/bash
#SBATCH --job-name=metacount
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=04:00:00
#SBATCH --mem=16G
#SBATCH --output=logs/%j.log
#SBATCH --array=1-10

module load python/3.11
source venv/bin/activate

python run_experiment.py sa $SLURM_ARRAY_TASK_ID 200 99999

See 05_batch_script_tutorial.ipynb for a complete worked example including a reusable run_experiment.py template and result-collection scripts.

get_help('batch')   # inline guide

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacountregressor-1.0.39.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metacountregressor-1.0.39-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file metacountregressor-1.0.39.tar.gz.

File metadata

  • Download URL: metacountregressor-1.0.39.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metacountregressor-1.0.39.tar.gz
Algorithm Hash digest
SHA256 0897763acfccf57264b790cddec9dba574c134f9130fa12d1c48c6495da3a6db
MD5 0e087591e6f43701499e3cf4fbb4607b
BLAKE2b-256 7b09a626fc5bd5d17c45a470ab8d287d2f56d33267a6140b2599b6e74f0e8cf4

See more details on using hashes here.

File details

Details for the file metacountregressor-1.0.39-py3-none-any.whl.

File metadata

File hashes

Hashes for metacountregressor-1.0.39-py3-none-any.whl
Algorithm Hash digest
SHA256 7091c2d1478b43cf7a79e07fb4d666540a29cac84d67aa1bc4dbaca57cbee45f
MD5 ee9147e9ceb7edb7e92f796f89421e2d
BLAKE2b-256 92a69f22f4f51aa8764260526e2384beb3b2d21509b3307d26f09188dd68055b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page