Skip to main content

JAX-first hierarchical search and fitting for count, CMF, duration, and linear models.

Project description

metacountregressor Cookbook

metacountregressor is a JAX-first package for hierarchical model fitting and metaheuristic structure search across:

  • count models
  • CMF models
  • duration models
  • linear models

This cookbook now uses the bundled Example 16-3 data from the linked CSV source and keeps the original source column names.

1. Install

python -m pip install -e .
python -m pip install jax jaxlib jaxopt

Quick import check:

python -c "from metacountregressor import __version__, load_example16_3_raw_data; print(__version__, load_example16_3_raw_data().shape)"

2. Example Data In The Package

The package now exposes the Example 16-3 data directly:

from metacountregressor import load_example16_3_raw_data, load_example16_3_model_data

raw_df = load_example16_3_raw_data()
model_df = load_example16_3_model_data()

3.1 Raw data loader

load_example16_3_raw_data() returns the original CSV columns:

  • ID
  • FREQ
  • LENGTH
  • INCLANES
  • DECLANES
  • WIDTH
  • MIMEDSH
  • MXMEDSH
  • SPEED
  • URB
  • FC
  • AADT
  • SINGLE
  • DOUBLE
  • TRAIN
  • PEAKHR
  • GRADEBR
  • MIGRADE
  • MXGRADE
  • MXGRDIFF
  • TANGENT
  • CURVES
  • MINRAD
  • ACCESS
  • MEDWIDTH
  • FRICTION
  • ADTLANE
  • SLOPE
  • INTECHAG
  • AVEPRE
  • AVESNOW

3.2 Model-ready loader

load_example16_3_model_data() preserves all source columns and adds:

  • OFFSET
  • FC_ENCODED
  • FC_LABEL

Notes:

  • FC remains the original source coding from the Example 16-3 data.
  • FC_ENCODED is a clean ordered encoding of the observed FC categories for comparison experiments.
  • FC_LABEL is a readable string form like FC_1, FC_2, FC_5.

3. Build The Main ExperimentBuilder

from metacountregressor import ExperimentBuilder, load_example16_3_model_data

df = load_example16_3_model_data()

builder = ExperimentBuilder(
    df=df,
    id_col="ID",
    y_col="FREQ",
    offset_col="OFFSET",
    group_id_col="FC",
)

Which arguments can be None

In ExperimentBuilder(...):

  • id_col Required. Do not pass None.
  • y_col Required. Do not pass None.
  • offset_col Optional. You can pass None.
  • group_id_col Optional. You can pass None.

In build_evaluator(...):

  • variables=None Uses all candidate columns.
  • fixed_override=None No variable-specific fixed-role restrictions.
  • membership_override=None No variable-specific membership-role restrictions.
  • exclude=None Do not exclude extra columns.
  • default_roles=None Let the package choose family defaults.

In CMF helpers:

  • offset_col=None Allowed.
  • group_id_col=None Allowed.
  • variables=None Allowed.

Helpful inspection:

builder.describe()
builder.suggest_config(max_latent_classes=2)
print(builder.get_family_capabilities())
print(builder.get_search_argument_guide())

4. Main Search Arguments

Shared arguments:

  • algo Use sa, hc, de, or hs.
  • R Number of simulation draws.
  • max_iter Search iterations.
  • max_latent_classes Maximum latent classes allowed.
  • variables Candidate search columns.
  • default_roles Allowed structural roles.
  • fixed_override Restrict roles for named variables.
  • membership_override Restrict membership roles for named variables.

To save results consistently:

from metacountregressor import SearchOutputConfig

output_config = SearchOutputConfig(
    output_dir="results",
    experiment_name="example16_3_count_search",
    search_description="Count model search on Example 16-3 data",
)

5. Role Codes

Code Meaning
0 Excluded
1 Fixed
2 Random independent
3 Random correlated
4 Grouped random
5 Heterogeneity in means
6 Zero inflation
7 Membership only
8 Membership plus fixed outcome

Random-parameter distributions:

  • normal
  • lognormal
  • triangular
  • uniform

6. Count Models

7.1 Count search

evaluator = builder.build_count_evaluator(
    variables=[
        "AADT",
        "LENGTH",
        "SPEED",
        "CURVES",
        "TANGENT",
        "SLOPE",
        "ACCESS",
        "URB",
        "AVEPRE",
    ],
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)

result = builder.run(
    evaluator=evaluator,
    algo="sa",
    max_iter=2000,
    seed=42,
    output_config=output_config,
)

7.2 Manual count model

manual_spec = builder.make_manual_spec(
    fixed_terms=["AADT", "LENGTH", "SPEED"],
    rdm_terms=["CURVES:normal"],
    rdm_cor_terms=["TANGENT:normal", "SLOPE:lognormal"],
    hetro_in_means=["AVEPRE"],
    zi_terms=["ACCESS"],
    membership_terms=["URB"],
    dispersion=1,
    latent_classes=2,
)

fit = builder.fit_manual_model(
    manual_spec=manual_spec,
    model="nb",
    R=200,
)

7. CMF Models

CMF models use:

log(mu) = baseline block + local block * log(AADT)

The default CMF route transforms the CMF design and then runs on the main JAX hierarchical architecture.

8.1 CMF search

cmf_search = builder.build_evaluator(
    model_family="cmf",
    aadt_col="AADT",
    baseline_vars=["URB", "ACCESS", "GRADEBR"],
    local_vars=["CURVES", "SLOPE", "WIDTH"],
    variables=["AVEPRE", "AVESNOW", "FC_ENCODED"],
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)

cmf_result = builder.run_search(
    cmf_search,
    algo="sa",
    max_iter=2000,
    seed=7,
)

8.2 Manual CMF model

from metacountregressor import CMFExperimentBuilder

cmf_builder = CMFExperimentBuilder(
    df=df,
    y_col="FREQ",
    aadt_col="AADT",
    baseline_vars=["URB", "ACCESS"],
    local_vars=["CURVES", "SLOPE"],
)

manual_cmf_spec = cmf_builder.make_manual_cmf_spec(
    baseline_fixed=["URB"],
    baseline_correlated=["ACCESS"],
    local_random=["CURVES"],
    local_correlated=["SLOPE"],
    hetro_in_means=["AVEPRE"],
    zi_terms=["INTECHAG"],
    membership_terms=["FC_ENCODED"],
    dispersion=1,
    latent_classes=2,
)

cmf_fit = cmf_builder.fit_manual_cmf_model(
    id_col="ID",
    offset_col="OFFSET",
    group_id_col="FC",
    manual_spec=manual_cmf_spec,
    model="nb",
    R=200,
)

8.3 Legacy GA-CMF route

legacy_cmf = builder.build_evaluator(
    model_family="cmf",
    cmf_driver="ga",
    aadt_col="AADT",
    baseline_vars=["URB", "ACCESS"],
    local_vars=["CURVES", "SLOPE"],
)

8. Duration Models

The default duration route now uses the main JAX hierarchical architecture with a lognormal family.

Use the model-ready duration loader:

from metacountregressor import ExperimentBuilder, load_example_duration_data

duration_df = load_example_duration_data()
duration_builder = ExperimentBuilder(
    df=duration_df,
    id_col="ID",
    y_col="DURATION",
    offset_col=None,
    group_id_col="FC",
)

9.1 Duration search

duration_search = duration_builder.build_evaluator(
    model_family="duration",
    variables=["WIDTH", "CURVES", "SLOPE", "URB", "FC_ENCODED"],
    budget_col="AADT",
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)

9.2 Manual duration model

duration_spec = duration_builder.make_manual_spec(
    fixed_terms=["WIDTH"],
    rdm_terms=["CURVES:normal"],
    rdm_cor_terms=["SLOPE:normal", "URB:normal"],
    hetro_in_means=["AVEPRE"],
    membership_terms=["FC_ENCODED"],
    latent_classes=2,
)

duration_fit = duration_builder.fit_manual_model(
    manual_spec=duration_spec,
    model="lognormal",
    R=200,
)

9. Linear Models

The default linear route now uses the main JAX hierarchical architecture with a Gaussian family.

Use the model-ready linear loader:

from metacountregressor import ExperimentBuilder, load_example_linear_data

linear_df = load_example_linear_data()
linear_builder = ExperimentBuilder(
    df=linear_df,
    id_col="ID",
    y_col="LINEAR_TARGET",
    offset_col=None,
    group_id_col="FC",
)

10.1 Linear search

linear_search = linear_builder.build_evaluator(
    model_family="linear",
    variables=["WIDTH", "CURVES", "SLOPE", "URB", "FC_ENCODED"],
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)

10.2 Manual linear model

linear_spec = linear_builder.make_manual_spec(
    fixed_terms=["WIDTH"],
    rdm_terms=["CURVES:normal"],
    rdm_cor_terms=["SLOPE:normal", "URB:normal"],
    hetro_in_means=["AVEPRE"],
    membership_terms=["FC_ENCODED"],
    latent_classes=2,
)

linear_fit = linear_builder.fit_manual_model(
    manual_spec=linear_spec,
    model="gaussian",
    R=200,
)

10. Platform-Speed Linear Mixed Effects Example

The package now includes a synthetic example designed for linear mixed-effects style experiments around vehicle speed relative to a platform.

Load it with:

from metacountregressor import load_example_platform_speed_data, ExperimentBuilder

platform_df = load_example_platform_speed_data()

platform_builder = ExperimentBuilder(
    df=platform_df,
    id_col="PLATFORM_ID",
    y_col="RELATIVE_SPEED",
    offset_col=None,
    group_id_col="PLATFORM_TYPE",
)

Available columns include:

  • DIST_TO_PLATFORM
  • VEHICLE_SPEED
  • RELATIVE_SPEED
  • POSTED_SPEED
  • APPROACH_ACCEL
  • PLATFORM_TYPE
  • PLATFORM_HEIGHT
  • PLATFORM_WIDTH
  • AT_PLATFORM

11.1 Linear mixed-effects style search

platform_linear_search = platform_builder.build_evaluator(
    model_family="linear",
    variables=[
        "DIST_TO_PLATFORM",
        "POSTED_SPEED",
        "APPROACH_ACCEL",
        "PLATFORM_HEIGHT",
        "PLATFORM_WIDTH",
        "AT_PLATFORM",
    ],
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 7, 8],
)

This is set up to model speed relative to the platform while allowing:

  • random parameters
  • correlated random parameters
  • grouped effects
  • heterogeneity in means
  • latent classes

11. Duration Example: Time Until Another Vehicle Speeds Over The Platform

The package also includes a synthetic duration experiment for the time before another vehicle speeds over the platform.

Load it with:

from metacountregressor import load_example_platform_gap_duration_data, ExperimentBuilder

gap_df = load_example_platform_gap_duration_data()

gap_builder = ExperimentBuilder(
    df=gap_df,
    id_col="PLATFORM_ID",
    y_col="DURATION_UNTIL_NEXT_SPEEDING",
    offset_col=None,
    group_id_col=None,
)

Available columns include:

  • DURATION_UNTIL_NEXT_SPEEDING
  • PRECEDING_VEHICLE_SPEED
  • FOLLOWING_VEHICLE_SPEED
  • POSTED_SPEED
  • PLATFORM_HEIGHT
  • PLATFORM_WIDTH
  • APPROACH_VOLUME

12.1 Duration search

gap_duration_search = gap_builder.build_evaluator(
    model_family="duration",
    variables=[
        "PRECEDING_VEHICLE_SPEED",
        "FOLLOWING_VEHICLE_SPEED",
        "POSTED_SPEED",
        "PLATFORM_HEIGHT",
        "PLATFORM_WIDTH",
        "APPROACH_VOLUME",
    ],
    budget_col="POSTED_SPEED",
    mode="single",
    max_latent_classes=2,
    R=200,
    default_roles=[0, 1, 2, 3, 4, 5, 7, 8],
)

This uses the JAX hierarchical lognormal path and is intended for duration-before-speeding style analysis.

12. What Changing Search Arguments Does

Change the search algorithm

builder.run(evaluator=evaluator, algo="sa", max_iter=2000, seed=1)
builder.run(evaluator=evaluator, algo="de", max_iter=2000, seed=1)
builder.run(evaluator=evaluator, algo="hs", max_iter=2000, seed=1)

Change simulation draws

evaluator = builder.build_count_evaluator(R=500)

Higher R means:

  • slower estimation
  • more stable simulation-based fitting

Restrict allowed structures

evaluator = builder.build_count_evaluator(
    variables=["AADT", "SPEED", "ACCESS"],
    default_roles=[0, 1, 2, 6],
)

Restrict specific variables

evaluator = builder.build_count_evaluator(
    variables=["AADT", "SPEED", "URB"],
    fixed_override={"AADT": [1]},
    membership_override={"URB": [7, 8]},
)

13. Consistent Run Output

from metacountregressor import SearchOutputConfig

output_config = SearchOutputConfig(
    output_dir="results",
    experiment_name="cmf_example16_3",
    search_description="CMF search on Example 16-3 data",
)

saved = builder.run_search(
    cmf_search,
    algo="sa",
    max_iter=1000,
    output_config=output_config,
)

print(saved["saved_to"])

Each saved JSON file stores:

  • experiment name
  • search description
  • family
  • algorithm
  • normalized result payload

14. Latent-Class Example: Recover Functional Class

This example is designed to see whether a latent-class model can recover the hidden FC grouping pattern without using FC itself as a direct predictor in the outcome equation.

We keep:

  • original truth column: FC
  • comparison encoding: FC_ENCODED

We do not place FC or FC_ENCODED in the outcome equation. Instead we let membership variables explain latent class probabilities.

13.1 Fit a latent-class count model

latent_spec = builder.make_manual_spec(
    fixed_terms=["AADT", "SPEED", "LENGTH"],
    rdm_cor_terms=["CURVES:normal", "SLOPE:normal"],
    hetro_in_means=["AVEPRE"],
    membership_terms=["URB", "ACCESS", "GRADEBR"],
    dispersion=1,
    latent_classes=2,
)

latent_fit = builder.fit_manual_model(
    manual_spec=latent_spec,
    model="nb",
    R=200,
)

13.2 Compute latent-class probabilities and compare to the true FC grouping

class_probs = builder.compute_latent_class_probabilities(
    latent_fit,
    true_class_col="FC_ENCODED",
)

print(class_probs.head())

Returned columns include:

  • ID
  • class_1_prob
  • class_2_prob
  • FC_ENCODED

13.3 Compare predicted class with the encoded true class

class_probs["predicted_class"] = (
    class_probs[["class_1_prob", "class_2_prob"]]
    .to_numpy()
    .argmax(axis=1)
)

agreement = (
    class_probs["predicted_class"].to_numpy()
    == class_probs["FC_ENCODED"].to_numpy()
).mean()

print("Agreement:", agreement)

This is the cookbook pattern for checking whether the latent-class structure is capturing the observed facility-class segmentation.

15. Common Validation Errors

The package now raises clearer errors for:

  • missing columns
  • invalid family-specific arguments
  • CMF specifications missing aadt_col, baseline_vars, or local_vars
  • CMF data with non-positive AADT
  • latent-class probability requests on single-class fits

16. Summary

Use these loaders when you want the real Example 16-3 data inside the package:

  • load_example16_3_raw_data()
  • load_example16_3_model_data()
  • load_example_duration_data()
  • load_example_linear_data()

Use these builder patterns:

  • count: build_count_evaluator(...)
  • CMF: build_evaluator(model_family="cmf", ...)
  • duration: build_evaluator(model_family="duration", ...)
  • linear: build_evaluator(model_family="linear", ...)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacountregressor-1.0.34.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metacountregressor-1.0.34-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file metacountregressor-1.0.34.tar.gz.

File metadata

  • Download URL: metacountregressor-1.0.34.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metacountregressor-1.0.34.tar.gz
Algorithm Hash digest
SHA256 8134fc76595fc034a4ec24abba96d63549ae855eb6d97ed5d16431ecd2339a87
MD5 798b8fc4a752ed28877d273671146902
BLAKE2b-256 af5c8bfcfe9fb4994ff6771dfee75141eed7dc9b4460fd0b83c36cc1aa0813cb

See more details on using hashes here.

File details

Details for the file metacountregressor-1.0.34-py3-none-any.whl.

File metadata

File hashes

Hashes for metacountregressor-1.0.34-py3-none-any.whl
Algorithm Hash digest
SHA256 e3a370c2c093676e7a19261c592f354ec81783af26339b9f28e52543d4307272
MD5 c08e45f6567b7c5127f02778cb3e541d
BLAKE2b-256 d0c60d87aa4b40f82de02a0a521fe15665c65fc1a2252c2bb522358187a908e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page