Metaheuristic-guided automated structure search for mixed count-data regression models

These details have not been verified by PyPI

Project links

Project description

MetaCountRegressor

Tutorial also available as a jupyter notebook

The tutorial provides more extensive examples on how to run the code and perform experiments. Further documentation is currently in development.

Download Notebook Designed for Analysis Use Case

For an Application Setup Download the following GUI

Download Application

The application involves setting up a problem instance to run the models.

Entire Git Repository is available to clone.

Steps

Clone Project
Navigate to "metacountregressor/application_gui/dist/meta_app"
Run meta_app.exe
Navigate to metacountregressor/app_main.py
Run app_main.py

Setup For Python Package Approach

The Below code demonstrates how to set up automatic optimization assisted by the harmony search algorithm. References to the Differential Evolution and Simulated Annealing has been mentioned (change accordingly)

Install: Requires Python 3.10

Install metacountregressor using pip as follows:

pip install metacountregressor


```python
import pandas as pd
import numpy as np
from metacountregressor.solution import ObjectiveFunction
from metacountregressor.metaheuristics import (harmony_search,
                                            differential_evolution,
                                            simulated_annealing)

loaded standard packages
loaded helper
testing

Basic setup.

The initial setup involves reading in the data and selecting an optimization algorithm. As the runtime progresses, new solutions will be continually evaluated. Finally, at the end of the runtime, the best solution will be identified and printed out. In the case of multiple objectives all of the best solutions will be printed out that belong to the Pareto frontier.

# Read data from CSV file
df = pd.read_csv(
"https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
X = df
y = df['FREQ']  # Frequency of crashes
X['Offset'] = np.log(df['AADT']) # Explicitley define how to offset the data, no offset otherwise
# Drop Y, selected offset term and  ID as there are no panels
X = df.drop(columns=['FREQ', 'ID', 'AADT'])  

#some example argument, these are defualt so the following line is just for claritity. See the later agruments section for detials.
arguments = {'algorithm': 'hs', 'test_percentage': 0.15, 'test_complexity': 6, 'instance_name':1,
             'val_percentage':0.15, 'obj_1': 'bic', '_obj_2': 'RMSE_TEST', "_max_time": 6}
# Fit the model with metacountregressor
obj_fun = ObjectiveFunction(X, y, **arguments)
#replace with other metaheuristics if desired
results = harmony_search(obj_fun)

Arguments to feed into the Objective Function:

Note: Please Consider the main arguments to change.

algorithm: This parameter has multiple choices for the algorithm, such as 'hs', 'sa', and 'de'. Only one choice should be defined as a string value.
test_percentage: This parameter represents the percentage of data used for in-sample prediction of the model. The value 0.15 corresponds to 15% of the data.
val_percentage: This parameter represents the percentage of data used to validate the model. The value 0.15 corresponds to 15% of the data.
test_complexity: This parameter defines the complexity level for testing. The value 6 tests all complexities. Alternatively, you can provide a list of numbers to consider different complexities. The complexities are further explained later in this document.
instance_number: This parameter is used to give a name to the outputs.
_obj_1: This parameter has multiple choices for obj_1, such as 'bic', 'aic', and 'hqic'. Only one choice should be defined as a string value.
_obj_2: This parameter has multiple choices for objective 2, such as 'RMSE_TEST', 'MSE_TEST', and 'MAE_TEST'.
_max_time: This parameter specifies the maximum number of seconds for the total estimation before stopping.
distribution: This parameter is a list of distributions to consider. Please select all of the available options and put them into a list of valid options if you want to to consider the distribution type for use when modellign with random parameters. The valid options include: 'Normal', 'LnNormal', 'Triangular', and 'Uniform'.
transformations: This parameters is a list of transformations to consider. Plesee select all of the available options and put them into a list of valid options if you want to consider the transformation type. The valid options include 'Normal', 'LnNormal', 'Triangular', 'Uniform'.
method_ll: This is a specificication on the type of solvers are avilable to solve the lower level maximum likilihood objective. The valid options include: 'Normal', 'LnNormal', 'Triangular', and 'Uniform'.

Example of changing the arguments:

Modify the arguments according to your preferences using the commented code as a guide.

#Solution Arguments
arguments = {
        'algorithm': 'hs', #alternatively input 'de', or 'sa'
        'is_multi': 1,
        'test_percentage': 0.2, # used in multi-objective optimisation only. Saves 20% of data for testing.
        'val_percenetage:': 0.2, # Saves 20% of data for testing.
        'test_complexity': 6, # Complexity level for testing (6 tests all) or a list to consider potential differences in complexity
        'instance_name': 'name', # used for creeting a named folder where your models are saved into from the directory
        'distribution': ['Normal', 'LnNormal', 'Triangular', 'Uniform'],
        'model_types': [[0,1]],  # or equivalently ['POS', 'NB']
        'transformations': ['no', 'sqrt', 'archsinh'],
        'method_ll': 'BFGS_2',
        '_max_time': 10
    }
obj_fun = ObjectiveFunction(X, y, **arguments)
results = harmony_search(obj_fun)

Initial Solution Configurement

Listed below is an example of how to specify an initial solution within the framework. This initial solution will be used to calculate the fitness and considered in the objective-based search. However, as the search progresses, different hypotheses may be proposed, and alternative modeling components may completely replace the initial solution.

 #Model Decisions, Specify for initial solution that will be optimised.
manual_fit_spec = {
    'fixed_terms': ['SINGLE', 'LENGTH'],
    'rdm_terms': ['AADT:normal'],
    'rdm_cor_terms': ['GRADEBR:normal', 'CURVES:normal'],
    'grouped_rdm': [],
    'hetro_in_means': ['ACCESS:normal', 'MINRAD:normal'],
    'transformations': ['no', 'no', 'log', 'no', 'no', 'no', 'no'],
    'dispersion': 0
}


#Search Arguments
arguments = {
    'algorithm': 'hs',
    'test_percentage': 0.2,
    'test_complexity': 6,
    'instance_name': 'name',
    'Manual_Fit': manual_fit_spec
}
obj_fun = ObjectiveFunction(X, y, **arguments)

Setup Complete...
Benchmaking test with Seed 42
--------------------------------------------------------------------------------
Log-Likelihood:  -1339.1862434675106
--------------------------------------------------------------------------------
bic: 2732.31
--------------------------------------------------------------------------------
MSE: 650856.32
+--------------------------+--------+-------+----------+----------+------------+
|          Effect          | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+==========================+========+=======+==========+==========+============+
| LENGTH                   | no     | -0.15 |   0.01   |  -12.98  | 0.00***    |
+--------------------------+--------+-------+----------+----------+------------+
| SINGLE                   | no     | -2.46 |   0.04   |  -50.00  | 0.00***    |
+--------------------------+--------+-------+----------+----------+------------+
| GRADEBR                  | log    | 4.23  |   0.10   |  42.17   | 0.00***    |
+--------------------------+--------+-------+----------+----------+------------+
| CURVES                   | no     | 0.51  |   0.01   |  34.78   | 0.00***    |
+--------------------------+--------+-------+----------+----------+------------+
|  Chol: GRADEBR (Std.     |        | 2.21  |   0.00   |  50.00   | 0.00***    |
| Dev. normal) )           |        |       |          |          |            |
+--------------------------+--------+-------+----------+----------+------------+
|  Chol: CURVES (Std. Dev. |        | -0.51 |   0.00   |  -50.00  | 0.00***    |
| normal) )                |        |       |          |          |            |
+--------------------------+--------+-------+----------+----------+------------+
|  Chol: CURVES (Std. Dev. | no     | 0.55  |   0.00   |  50.00   | 0.00***    |
| normal) . GRADEBR (Std.  |        |       |          |          |            |
| Dev.   normal )          |        |       |          |          |            |
+--------------------------+--------+-------+----------+----------+------------+
| main: MINRAD: hetro      | no     | -0.00 |   0.00   |  -44.36  | 0.00***    |
| group 0                  |        |       |          |          |            |
+--------------------------+--------+-------+----------+----------+------------+
| ACCESS: hetro group 0    |        | 0.68  |   0.09   |   7.68   | 0.00***    |
+--------------------------+--------+-------+----------+----------+------------+
| main: MINRAD: hetro      |        | -0.00 |   0.00   |  -44.86  | 0.00***    |
| group 0:normal:sd  hetro |        |       |          |          |            |
| group 0                  |        |       |          |          |            |
+--------------------------+--------+-------+----------+----------+------------+

Simarly to return the results feed the objective function into a metaheuristic solution algorithm. An example of this is provided below:

results = harmony_search(obj_fun)
print(results)

Notes:

Capabilities of the software include:

Handling of Panel Data
Support for Data Transformations
Implementation of Models with Correlated and Non-Correlated Random Parameters
A variety of mixing distributions for parameter estimations, including normal, lognormal, truncated normal, Lindley, Gamma, triangular, and uniform distributions Capability to handle heterogeneity in the means of the random parameters
Use of Halton draws for simulated maximum likelihood estimation
Support for grouped random parameters with unbalanced groups
Post-estimation tools for assessing goodness of fit, making predictions, and conducting out-of-sample validation
Multiple parameter optimization routines, such as the BFGS method
Comprehensive hypothesis testing using single objectives, such as in-sample BIC and log-likelihood
Extensive hypothesis testing using multiple objectives, such as in-sample BIC and out-of-sample MAE (Mean Absolute Error), or in-sample AIC and out-of-sample MSPE (mean-square prediction errorr)
Features that allow analysts to pre-specify variables, interactions, and mixing distributions, among others
Meta-heuristic Guided Optimization, including techniques like Simulated Annealing, Harmony Search, and Differential Evolution
Customization of Hyper-parameters to solve problems tailored to your dataset
Out-of-the-box optimization capability using default metaheuristics

Intepreting the output of the model:

A regression table is produced. The following text elements are explained:

Std. Dev.: This column appears for effects that are related to random paramters and displays the assument distributional assumption next to it
Chol: This term refers to Cholesky decomposition element, to show the correlation between two random paramaters. The combination of the cholesky element on iyself is equivalent to a normal random parameter.
hetro group: This term represents the heterogeneity group number, which refers all of the contributing factors that share hetrogentiy in the means to each other under the same numbered value.
$\tau$: This column, displays the type of transformation that was applied to the specific contributing factor in the data.

Arguments:

In reference to the arguments that can be fed into the solution alrogithm, a dictionary system is utilised with relecant names these include

The following list describes the arguments available in this function. By default, all of the capabilities described are enabled unless specified otherwise as an argument. For list arguments, include all desired elements in the list to ensure the corresponding options are considered. Example code will be provided later in this guide.

complexity_level: This argument accepts an integer 1-6 or a list based of integegers between 0 to 5 eg might be a possible configuration [0, 2, 3]. Each integer represents a hierarchy level for estimable models associated with each explanatory variable. Here is a summary of the hierarchy:
- 0: Null model
- 1: Simple fixed effects model
- 2: Random parameters model
- 3: Random correlated parameters model
- 4: Grouped random parameters model
- 5: Heterogeneity in the means random parameter model
Note: For the grouped random parameters model, groupings need to be defined prior to estimation. This can be achieved by including the following key-value pair in the arguments of the ObjectiveFunction: 'group': "Enter Column Grouping in data". Replace "Enter Column Grouping in data" with the actual column grouping in your dataset.

Similarly, for panel data, the panel column needs to be defined using the key-value pair: 'panel': "enter column string covering panels". Replace "enter column string covering panels" with the appropriate column string that represents the panel information in your dataset.
distributions: This argument accepts a list of strings where each string corresponds to a distribution. Valid options include:
- "Normal"
- "Uniform"
- "LogNormal"
- "Triangular"
- "TruncatedNormal"
- Any of the above, concatenated with ":" (e.g., "Normal:grouped"; requires a grouping term defined in the model)
Model: This argument specifies the model form. It can be a list of integers representing different models to test:
- 0: Poisson
- 1: Negative-Binomial
transformations: This argument accepts a list of strings representing available transformations within the framework. Valid options include:
- "no"
- "square-root"
- "logarithmic"
- "archsinh"
- "nil"
is_multi: This argument accepts an integer indicating whether single or multiple objectives are to be tested (0 for single, 1 for multiple).
test_percentage: This argument is used for multi-objective optimization. Define it as a decimal; for example, 0.2 represents 20% of the data for testing.
val_percentage: This argument saves data for validation. Define it as a decimal; for example, 0.2 represents 20% of the data for validation.
_max_time: This argument is used to add a termination time in the algorithm. It takes values as seconds. Note the time is only dependenant on the time after intial population of solutions are generated.

Example: Assistance by Harmony Search

Let's begin by fitting very simple models and use the structure of these models to define our objectives. Then, we can conduct a more extensive search on the variables that are more frequently identified. For instance, in the case below, the complexity is level 3, indicating that we will consider, at most randomly correlated parameters. This approach is useful for initially identifying a suitable set of contributing factors for our search.

'''Setup Data'''
df = pd.read_csv(
"https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
X = df
y = df['FREQ']  # Frequency of crashes
X['Offset'] = np.log(df['AADT']) # Explicitley define how to offset the data, no offset otherwise
# Drop Y, selected offset term and  ID as there are no panels
X = df.drop(columns=['FREQ', 'ID', 'AADT'])  
'''Aguments for Solution'''
arguments = {
        'is_multi': 1, #is two objectives considered
        'test_percentage': 0.2, # used in multi-objective optimisation only. Saves 20% of data for testing.
        'val_percentage:': 0.2, # Saves 20% of data for testing.
        'test_complexity': 3, # For Very simple Models
        'obj_1': 'BIC', '_obj_2': 'RMSE_TEST',
        'instance_name': 'hs_run', # used for creeating a named folder where your models are saved into from the directory
        'distribution': ['Normal'],
        'model_types': [0, 1],  # or equivalently ['POS', 'NB']
        'transformations': ['no', 'sqrt', 'archsinh'],
        '_max_time': 10000
} '''Arguments for the solution algorithm'''
argument_hs = {
    '_hms': 20, #harmony memory size,
    '_mpai': 1, #adjustement inded
    '_par': 0.3,
    '_hmcr': .5
}
obj_fun = ObjectiveFunction(X, y, **arguments)
results = harmony_search(obj_fun, None, argument_hs)
print(results)

Example: Assistance by Differential Evololution and Simulated Annealing

Similiar to the above example we only need to change the hyperparamaters, the obj_fun can remane the same

argument_de = {'_AI': 2,
            '_crossover_perc': .2,
            '_max_iter': 1000,
            '_pop_size': 25
}
de_results = differential_evolution(obj_fun, None, **argument_de)
print(de_results)


args_sa = {'alpha': .99,
        'STEPS_PER_TEMP': 10,
        'INTL_ACPT': 0.5,
        '_crossover_perc': .3,
        'MAX_ITERATIONS': 1000,
        '_num_intl_slns': 25,
}

sa_results = simulated_annealing(obj_fun, None, **args_sa)
print(sa_results)

Comparing to statsmodels

The following example illustrates how the output compares to well-known packages, including Statsmodels."

# Load modules and data
import statsmodels.api as sm

data = sm.datasets.sunspots.load_pandas().data
#print(data.exog)
data_exog = data['YEAR']
data_exog = sm.add_constant(data_exog)
data_endog = data['SUNACTIVITY']

# Instantiate a gamma family model with the default link function.
import numpy as np

gamma_model = sm.NegativeBinomial(data_endog, data_exog)
gamma_results = gamma_model.fit()

print(gamma_results.summary())




#NOW LET's COMPARE THIS TO METACOUNTREGRESSOR




 #Model Decisions, 
manual_fit_spec = {
    'fixed_terms': ['const','YEAR'],
    'rdm_terms': [],
    'rdm_cor_terms': [],
    'grouped_rdm': [],
    'hetro_in_means': [],
    'transformations': ['no', 'no'],
    'dispersion': 1 #Negative Binomial 
}


#Arguments
arguments = {
    'algorithm': 'hs',
    'test_percentage': 0,
    'test_complexity': 6,
    'instance': 'name',
    'Manual_Fit': manual_fit_spec
}
obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)

Optimization terminated successfully.
         Current function value: 4.877748
         Iterations: 22
         Function evaluations: 71
         Gradient evaluations: 70
                     NegativeBinomial Regression Results                      
==============================================================================
Dep. Variable:            SUNACTIVITY   No. Observations:                  309
Model:               NegativeBinomial   Df Residuals:                      307
Method:                           MLE   Df Model:                            1
Date:                Tue, 13 Aug 2024   Pseudo R-squ.:                0.004087
Time:                        14:13:22   Log-Likelihood:                -1507.2
converged:                       True   LL-Null:                       -1513.4
Covariance Type:            nonrobust   LLR p-value:                 0.0004363
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2913      1.017      0.287      0.774      -1.701       2.284
YEAR           0.0019      0.001      3.546      0.000       0.001       0.003
alpha          0.7339      0.057     12.910      0.000       0.622       0.845
==============================================================================
0.1.88
Setup Complete...
Benchmaking test with Seed 42
1
--------------------------------------------------------------------------------
Log-Likelihood:  -1509.0683662284273
--------------------------------------------------------------------------------
bic: 3035.84
--------------------------------------------------------------------------------
MSE: 10000000.00
+--------+--------+-------+----------+----------+------------+
| Effect | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+========+========+=======+==========+==========+============+
| const  | no     | 0.10  |   0.25   |   0.39   | 0.70       |
+--------+--------+-------+----------+----------+------------+
| YEAR   | no     | 0.00  |   0.00   |  20.39   | 0.00***    |
+--------+--------+-------+----------+----------+------------+
| nb     |        | 1.33  |   0.00   |  50.00   | 0.00***    |
+--------+--------+-------+----------+----------+------------+

JAX-Accelerated Fitting (Advanced API)

The package includes a JAX-based back-end (jax_engine, jax_engine_n) that provides fast gradient computation via JIT compilation and automatic differentiation. The high-level ExperimentBuilder API wraps this back-end and drives a metaheuristic search over model structure — selecting which variables to include, what role they play, how many latent sub-populations exist, and what distributional assumptions to use.

Quick Setup

import pandas as pd
from metacountregressor.experiment_package import ExperimentBuilder

df = pd.read_csv("crash_data.csv")

builder = ExperimentBuilder(
    df           = df,
    id_col       = "SITE_ID",    # panel/site identifier column
    y_col        = "CRASHES",    # count outcome column
    offset_col   = "EXPOSURE",   # optional log-exposure offset
    group_id_col = "FC",         # optional grouping column (e.g. road class)
)

builder.describe()           # prints data summary, outcome stats, variable types
builder.suggest_config()     # auto-suggests roles and distributions per variable

If you do not have panel data, set id_col to any column of unique row identifiers. offset_col and group_id_col are optional.

Role Codes

Each variable in the search is assigned a role that determines how it enters the model. The search explores which role best fits each variable.

Code	Name	Description
0	Excluded	Not in the model
1	Fixed	Single coefficient, constant across all observations
2	Random Independent	Individual-specific draws, independent across variables
3	Random Correlated	Individual-specific draws with joint covariance (Cholesky)
4	Grouped	Coefficient shared within a group (requires `group_id_col`)
5	Heterogeneity in means	Shifts the mean of a random-effect distribution
6	Zero Inflation	Enters the zero-inflation probability equation
7	Membership only	Enters class-membership equation only (latent class models)
8	Membership + Fixed	Enters both class-membership and outcome equations

Roles 7 and 8 are only active when max_latent_classes > 1.

Supported Distributions (Random Effects)

When a variable is assigned a random role (2, 3, or 4), the search also selects its mixing distribution:

Name	Code	Use case
`normal`	0	General-purpose, allows negative and positive values
`lognormal`	1	Strictly positive effects (e.g. exposure scaling)
`triangular`	2	Bounded symmetric effects

Complexity Levels

Complexity is controlled implicitly through which roles are allowed. You can restrict the search to simpler structures by limiting allowed_roles:

Complexity	Roles allowed	Description
1 — Fixed only	`[0, 1]`	Simple fixed-effects Poisson/NB
2 — Random independent	`[0, 1, 2]`	Adds uncorrelated random parameters
3 — Random correlated	`[0, 1, 2, 3]`	Adds Cholesky-parameterised covariance
4 — Grouped	`[0, 1, 2, 3, 4]`	Adds group-level shared effects
5 — Heterogeneity in means	`[0, 1, 2, 3, 4, 5]`	Adds variables that shift random-effect means
6 — Latent class	`[0, 1, 2, 3, 5, 7, 8]`	Multiple latent sub-populations

Pass allowed_roles to build_evaluator() to limit the search space and reduce runtime.

Building the Evaluator

evaluator = builder.build_evaluator(
    variables          = None,          # list of columns to search over; None = all
    fixed_override     = {"AADT": [1]}, # force specific roles for named variables
    membership_override= {"URB": [0, 7], "SPEED": [0, 7, 8]},  # latent class roles
    exclude            = ["ID"],        # always exclude these variables
    mode               = "single",      # "single" = minimise BIC; "multi" = BIC + test RMSE
    max_latent_classes = 1,             # set > 1 to enable latent class search
    R                  = 200,           # Halton simulation draws (higher = more stable)
    default_roles      = [0, 1, 2, 3],  # default allowed roles for unlisted variables
)

`mode` options

"single" — minimises BIC. Recommended for exploratory analyses or when a single best model is wanted.
"multi" — minimises BIC and out-of-sample RMSE simultaneously using Pareto-front methods (NSGA-II). Returns multiple non-dominated solutions. Requires more data (20 % is held out for testing).

Running the Search

result = builder.run(evaluator, algo="sa", max_iter=3000, seed=42)

Algorithm options

Key	Algorithm	Best for
`"sa"`	Simulated Annealing	Single-objective; recommended starting point
`"hc"`	Hill Climbing	Fast, greedy; good for small problems
`"de"`	Adaptive Differential Evolution + NSGA-II	Multi-objective (`mode="multi"`)
`"hs"`	Dynamic Harmony Search + NSGA-II	Multi-objective (`mode="multi"`)

SA / HC hyperparameters

result = builder.run(
    evaluator,
    algo          = "sa",
    max_iter      = 3000,
    seed          = 0,
    mutation_rate = 0.3,   # fraction of genes mutated per step
    alpha         = 0.995, # cooling schedule multiplier (closer to 1 = slower cooling)
    min_changes   = 1,     # minimum genes to change per neighbour
    max_changes   = 3,     # maximum genes to change per neighbour
    n_starts      = 1,     # number of independent restarts
)

DE hyperparameters

result = builder.run(
    evaluator,
    algo            = "de",
    max_iter        = 2000,
    population_size = 20,  # number of candidate solutions
    F               = 0.5, # scale factor for difference vectors
    CR              = 0.7, # crossover probability
)

HS hyperparameters

result = builder.run(
    evaluator,
    algo            = "hs",
    max_iter        = 2000,
    population_size = 20,
    hmcr            = 0.9,  # harmony memory consideration rate
    par_min         = 0.1,  # pitch adjustment rate (minimum)
    par_max         = 0.9,  # pitch adjustment rate (maximum)
)

Latent Class Models

Latent class models allow the data to be explained by multiple unobserved sub-populations. The search automatically determines how many classes best fit the data.

evaluator = builder.build_evaluator(
    mode               = "single",
    max_latent_classes = 3,              # search over 1, 2, or 3 classes
    membership_override= {
        "URB":   [0, 7],    # can influence class membership only
        "SPEED": [0, 7, 8], # can influence membership and the outcome equation
    },
    R = 150,
)

result = builder.run(evaluator, algo="sa", max_iter=3000, seed=42)

The EM warm-start is applied automatically when classes > 1: a single-class solution seeds the multi-class initialisation, improving convergence stability.

What the search controls for latent classes

Number of classes (1 up to max_latent_classes) — selected automatically
Class-membership variables — which variables enter the softmax class-probability equation
Per-class fixed effects — estimated separately for each latent class
Dispersion — Poisson (0) or Negative Binomial (1) selected per run

Manual Model Specification (no search)

If you want to fit a specific structure directly without running a metaheuristic:

from metacountregressor.jax_engine_n import CountModel, build_model_from_manual_spec

manual_spec = {
    "fixed_terms"   : ["X1", "X2"],
    "rdm_terms"     : ["X3:normal", "X4:lognormal"],   # independent random
    "rdm_cor_terms" : ["X5:normal", "X6:triangular"],  # correlated random (Cholesky)
    "grouped_terms" : [],
    "hetro_in_means": ["X7"],                           # heterogeneity in means
    "dispersion"    : 1,                                # 0 = Poisson, 1 = Negative Binomial
}

data, spec = build_model_from_manual_spec(
    df         = df,
    manual_spec= manual_spec,
    id_col     = "SITE_ID",
    y_col      = "CRASHES",
    offset_col = "EXPOSURE",
    R          = 200,
)

model = CountModel(spec, data)
result = model.fit()

print(f"Log-likelihood: {model.loglik():.4f}")
print(f"BIC:            {model.bic():.4f}")
model.predict()   # returns mean predictions

Full Example: Crash-Frequency Latent Class Search

import pandas as pd
import numpy as np
from metacountregressor.experiment_package import ExperimentBuilder

# Load data
df = pd.read_csv("https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
df["OFFSET"] = np.log(df["AADT"] * df["LENGTH"] * 365 / 1e8)
df = df.drop(columns=["AADT"])

builder = ExperimentBuilder(
    df           = df,
    id_col       = "ID",
    y_col        = "FREQ",
    offset_col   = "OFFSET",
    group_id_col = "FC",
)

builder.describe()
builder.suggest_config(max_latent_classes=2)

evaluator = builder.build_evaluator(
    mode               = "single",
    max_latent_classes = 2,
    membership_override= {"URBAN": [0, 7], "SPEED": [0, 7, 8]},
    exclude            = ["FC"],   # already used as group_id_col
    R                  = 150,
)

result = builder.run(evaluator, algo="sa", max_iter=1000, seed=42)

print("Best BIC:", result["best_score"])
print("Best structure:", result["best_solution"])

Simulation Draws (Halton)

The JAX engine uses scrambled Halton sequences for quasi-Monte Carlo integration over the random-effect distributions. The number of draws R controls the trade-off between speed and accuracy:

R	Accuracy	Speed
50	Low — suitable for quick screening	Fast
100	Moderate — reasonable for search	Moderate
200	Good — recommended default	Moderate
500+	High — use for final model fitting	Slow

Draws are pre-generated once per evaluator and reused across all model evaluations, ensuring consistent comparisons.

Paper

The following tutorial is in conjunction with our latest paper. A link the current paper can be found here MetaCountRegressor

Contact

If you have any questions, ideas to improve MetaCountRegressor, or want to report a bug, just open a new issue in GitHub repository.

Citing MetaCountRegressor

Please cite MetaCountRegressor as follows:

Ahern, Z., Corry P., Paz A. (2024). MetaCountRegressor [Computer software]. https://pypi.org/project/metacounregressor/

Or using BibTex as follows:

@misc{Ahern2024Meta,
   author = {Zeke Ahern, Paul Corry and Alexander Paz},
   journal = {PyPi},
   title = {metacountregressor · PyPI},
   url = {https://pypi.org/project/metacountregressor/0.1.80/},
   year = {2024},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.43

Apr 20, 2026

1.0.42

Apr 19, 2026

1.0.41

Apr 17, 2026

1.0.40

Apr 16, 2026

1.0.39

Apr 14, 2026

1.0.38

Apr 14, 2026

1.0.37

Apr 14, 2026

1.0.36

Apr 10, 2026

1.0.35

Apr 10, 2026

1.0.34

Apr 9, 2026

1.0.33

Apr 9, 2026

1.0.32

Apr 9, 2026

1.0.31

Apr 9, 2026

1.0.30

Apr 9, 2026

1.0.29

Apr 9, 2026

1.0.28

Apr 9, 2026

1.0.27

Apr 9, 2026

1.0.26

Apr 8, 2026

This version

1.0.25

Apr 8, 2026

1.0.24

Apr 8, 2026

1.0.23

Apr 8, 2026

1.0.22

Apr 6, 2026

1.0.21

Apr 6, 2026

1.0.20

Apr 6, 2026

1.0.19

Jun 3, 2025

1.0.18

May 26, 2025

1.0.17

Jun 3, 2025

1.0.16

May 26, 2025

1.0.15

Mar 18, 2026

1.0.13

May 13, 2025

1.0.12

May 11, 2025

1.0.11

May 8, 2025

1.0.10

May 1, 2025

1.0.9

May 1, 2025

1.0.8

May 1, 2025

1.0.7

Apr 30, 2025

1.0.6

Apr 22, 2025

1.0.5

Apr 22, 2025

1.0.4

Apr 9, 2025

1.0.3

Apr 22, 2025

1.0.2

Apr 9, 2025

0.1.350

Apr 6, 2025

0.1.349

Apr 5, 2025

0.1.348

Apr 4, 2025

0.1.347

Apr 5, 2025

0.1.346

Apr 4, 2025

0.1.344

Apr 4, 2025

0.1.342

Apr 4, 2025

0.1.341

Apr 4, 2025

0.1.340

Apr 4, 2025

0.1.339

Apr 4, 2025

0.1.338

Apr 4, 2025

0.1.336

Apr 3, 2025

0.1.335

Apr 3, 2025

0.1.334

Apr 2, 2025

0.1.333

Apr 3, 2025

0.1.332

Apr 2, 2025

0.1.325

Apr 2, 2025

0.1.324

Apr 2, 2025

0.1.323

Apr 2, 2025

0.1.322

Apr 2, 2025

0.1.320

Mar 31, 2025

0.1.319

Mar 31, 2025

0.1.318

Mar 31, 2025

0.1.317

Mar 31, 2025

0.1.316

Mar 30, 2025

0.1.315

Mar 30, 2025

0.1.314

Mar 29, 2025

0.1.313

Mar 29, 2025

0.1.312

Mar 28, 2025

0.1.311

Mar 26, 2025

0.1.310

Mar 25, 2025

0.1.309

Mar 25, 2025

0.1.308

Mar 23, 2025

0.1.307

Mar 21, 2025

0.1.306

Mar 21, 2025

0.1.305

Mar 21, 2025

0.1.304

Mar 21, 2025

0.1.303

Mar 21, 2025

0.1.302

Mar 21, 2025

0.1.243

Mar 19, 2025

0.1.241

Mar 19, 2025

0.1.239

Mar 17, 2025

0.1.238

Mar 15, 2025

0.1.237

Mar 14, 2025

0.1.236

Mar 14, 2025

0.1.235

Mar 12, 2025

0.1.234

Mar 14, 2025

0.1.233

Mar 12, 2025

0.1.232

Mar 11, 2025

0.1.231

Mar 11, 2025

0.1.230

Mar 11, 2025

0.1.229

Mar 11, 2025

0.1.228

Mar 11, 2025

0.1.227

Mar 11, 2025

0.1.215

Mar 11, 2025

0.1.214

Mar 11, 2025

0.1.213

Mar 11, 2025

0.1.212

Mar 11, 2025

0.1.211

Mar 10, 2025

0.1.210

Mar 11, 2025

0.1.209

Mar 10, 2025

0.1.208

Mar 9, 2025

0.1.207

Mar 1, 2025

0.1.206

Feb 28, 2025

0.1.205

Feb 27, 2025

0.1.204

Feb 27, 2025

0.1.203

Feb 27, 2025

0.1.202

Feb 27, 2025

0.1.178

Feb 26, 2025

0.1.177

Feb 26, 2025

0.1.176

Feb 26, 2025

0.1.175

Feb 26, 2025

0.1.170

Feb 26, 2025

0.1.169

Feb 26, 2025

0.1.168

Feb 26, 2025

0.1.167

Feb 24, 2025

0.1.166

Feb 24, 2025

0.1.165

Feb 24, 2025

0.1.164

Feb 24, 2025

0.1.163

Feb 24, 2025

0.1.162

Feb 24, 2025

0.1.161

Feb 24, 2025

0.1.160

Feb 24, 2025

0.1.159

Feb 23, 2025

0.1.158

Feb 23, 2025

0.1.157

Feb 23, 2025

0.1.156

Feb 23, 2025

0.1.155

Feb 23, 2025

0.1.154

Feb 23, 2025

0.1.153

Feb 23, 2025

0.1.152

Feb 22, 2025

0.1.151

Feb 22, 2025

0.1.150

Feb 22, 2025

0.1.149

Feb 22, 2025

0.1.148

Feb 22, 2025

0.1.147

Feb 22, 2025

0.1.146

Feb 22, 2025

0.1.145

Feb 22, 2025

0.1.144

Feb 22, 2025

0.1.143

Feb 22, 2025

0.1.142

Feb 22, 2025

0.1.141

Feb 22, 2025

0.1.140

Feb 20, 2025

0.1.139

Feb 18, 2025

0.1.138

Feb 18, 2025

0.1.137

Feb 18, 2025

0.1.136

Feb 18, 2025

0.1.135

Feb 18, 2025

0.1.134

Feb 18, 2025

0.1.133

Feb 18, 2025

0.1.132

Feb 18, 2025

0.1.131

Feb 18, 2025

0.1.130

Feb 18, 2025

0.1.129

Feb 17, 2025

0.1.128

Feb 17, 2025

0.1.127

Feb 17, 2025

0.1.126

Feb 17, 2025

0.1.125

Feb 17, 2025

0.1.124

Feb 17, 2025

0.1.123

Feb 17, 2025

0.1.122

Feb 17, 2025

0.1.121

Feb 14, 2025

0.1.120

Feb 3, 2025

0.1.119

Feb 3, 2025

0.1.118

Feb 3, 2025

0.1.117

Feb 3, 2025

0.1.116

Feb 27, 2025

0.1.115

Feb 27, 2025

0.1.114

Feb 27, 2025

0.1.113

Oct 8, 2024

0.1.111

Oct 8, 2024

0.1.108

Oct 8, 2024

0.1.107

Feb 27, 2025

0.1.106

Oct 8, 2024

0.1.103

Oct 8, 2024

0.1.101

Oct 8, 2024

0.1.98

Aug 14, 2024

0.1.97

Feb 26, 2025

0.1.96

Aug 14, 2024

0.1.95

Feb 26, 2025

0.1.93

Aug 13, 2024

0.1.91

Aug 13, 2024

0.1.89

Feb 26, 2025

0.1.88

Aug 12, 2024

0.1.87

Feb 26, 2025

0.1.86

Aug 12, 2024

0.1.85

Feb 26, 2025

0.1.84

Jul 12, 2024

0.1.83

Feb 26, 2025

0.1.82

Jul 12, 2024

0.1.81

Feb 26, 2025

0.1.78

Jul 12, 2024

0.1.76

Jul 12, 2024

0.1.73

Jun 21, 2024

0.1.71

Jun 21, 2024

0.1.69

Jun 20, 2024

0.1.67

Jun 20, 2024

0.1.65

Jun 19, 2024

0.1.64

Jun 19, 2024

0.1.63

Jun 14, 2024

0.1.62

Feb 20, 2024

0.1.61

Feb 19, 2024

0.1.60

Feb 16, 2024

0.1.59

Feb 15, 2024

0.1.58

Feb 15, 2024

0.1.57

Feb 13, 2024

0.1.56

Feb 13, 2024

0.1.55

Feb 13, 2024

0.1.54

Feb 13, 2024

0.1.53

Feb 13, 2024

0.1.52

Feb 13, 2024

0.1.51

Feb 8, 2024

0.1.50

Dec 21, 2023

0.1.49

Dec 21, 2023

0.1.48

Nov 9, 2023

0.1.47

Nov 9, 2023

0.1.46

Oct 11, 2023

0.1.45

Aug 24, 2023

0.1.44

Aug 23, 2023

0.1.43

Aug 20, 2023

0.1.42

Aug 20, 2023

0.1.41

Aug 18, 2023

0.1.40

Aug 16, 2023

0.1.39

Aug 15, 2023

0.1.38

Aug 14, 2023

0.1.37

Aug 14, 2023

0.1.36

Aug 14, 2023

0.1.35

Aug 14, 2023

0.1.34

Aug 12, 2023

0.1.33

Aug 12, 2023

0.1.32

Aug 12, 2023

0.1.31

Aug 12, 2023

0.1.30

Jul 28, 2023

0.1.29

Jul 28, 2023

0.1.28

Jun 17, 2023

0.1.27

Jun 15, 2023

0.1.26

Jun 15, 2023

0.1.25

Jun 14, 2023

0.1.24

Jun 14, 2023

0.1.23

Jun 14, 2023

0.1.22

Jun 14, 2023

0.1.21

Jun 14, 2023

0.1.20

Jun 14, 2023

0.1.19

Jun 14, 2023

0.1.18

Jun 14, 2023

0.1.17

Jun 14, 2023

0.1.16

Jun 14, 2023

0.1.15

Jun 14, 2023

0.1.14

Jun 14, 2023

0.1.13

Jun 14, 2023

0.1.12

Jun 14, 2023

0.1.11

Jun 14, 2023

0.1.10

Jun 14, 2023

0.1.9

Jun 14, 2023

0.1.8

Jun 14, 2023

0.1.7

Jun 14, 2023

0.1.6

Jun 14, 2023

0.1.5

Jun 14, 2023

0.1.4

Jun 14, 2023

0.1.3

Jun 14, 2023

0.1.2 yanked

Jun 6, 2023

Reason this release was yanked:

bugs

0.1.1 yanked

Jun 2, 2023

Reason this release was yanked:

want to test

0.1.0 yanked

Jun 1, 2023

Reason this release was yanked:

not ready

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacountregressor-1.0.25.tar.gz (1.1 MB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metacountregressor-1.0.25-py3-none-any.whl (1.9 MB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file metacountregressor-1.0.25.tar.gz.

File metadata

Download URL: metacountregressor-1.0.25.tar.gz
Upload date: Apr 8, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metacountregressor-1.0.25.tar.gz
Algorithm	Hash digest
SHA256	`4404b0a63a5ab93cd86cf40374607d6ce4cb234944493dbddce193d32d6d4145`
MD5	`fbb1677251f931603850e3c7f79da64c`
BLAKE2b-256	`77a8805f48e1680684d01ad0720b218da7cd997ef7c4e1f453d0d7a554354329`

See more details on using hashes here.

File details

Details for the file metacountregressor-1.0.25-py3-none-any.whl.

File metadata

Download URL: metacountregressor-1.0.25-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metacountregressor-1.0.25-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5dd5d1623aabbd7567dee4c2daeef2e8fb8c0d49f2b8c661ade83e1d2c09d372`
MD5	`210fd720c96b666f28eccc4dd8183d54`
BLAKE2b-256	`05d09c08899e82053c4e98af1882202394a4b6ee26f9339f6d4376c517bf8d3a`

See more details on using hashes here.

metacountregressor 1.0.25

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MetaCountRegressor

Tutorial also available as a jupyter notebook

For an Application Setup Download the following GUI

Entire Git Repository is available to clone.

Steps

Setup For Python Package Approach

Install: Requires Python 3.10

Basic setup.

Arguments to feed into the Objective Function:

Example of changing the arguments:

Initial Solution Configurement

Notes:

Capabilities of the software include:

Intepreting the output of the model:

Arguments:

In reference to the arguments that can be fed into the solution alrogithm, a dictionary system is utilised with relecant names these include

Example: Assistance by Harmony Search

Example: Assistance by Differential Evololution and Simulated Annealing

Comparing to statsmodels

JAX-Accelerated Fitting (Advanced API)

Quick Setup

Role Codes

Supported Distributions (Random Effects)

Complexity Levels

Building the Evaluator

mode options

Running the Search

Algorithm options

SA / HC hyperparameters

DE hyperparameters

HS hyperparameters

Latent Class Models

What the search controls for latent classes

Manual Model Specification (no search)

Full Example: Crash-Frequency Latent Class Search

Simulation Draws (Halton)

Paper

Contact

Citing MetaCountRegressor

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`mode` options