Skip to main content

Mixture modelling by Minimum Message Length (MML) using Factor Snob

Project description

About this Project

This repository is an updated version of the open-source Factor Snob program for Mixture modelling by Minimum Message Length (MML) written in C. The original source code was obtained from Lloyd Allison's website at: http://www.allisons.org/ll/MML/Notes/SNOB-factor/. The goal of this project is to make the program more accessible by providing a Python interface to the program. The original program was written in C and compiled to a binary executable.

I have made many changes to the code mainly to improve readability, and to fix some minor bugs I discovered. The current version produces the same output as the original, except for a couple of display bugs present in the original version (see below). Here is a summary of the changes:

  • Refactored function declarations into a central "snob.h" include file.
  • Previous version included "glob.c" within other files. "glob.c" now contains some generic and exported functions.
  • Replaced Old-style C-function prototypes with moder C function prototypes.
  • Removed as many global variables as possible and eliminated many side effects. The previous version relied too much on
    side effects and global variables. The current version has made variable passing more explicit. In some cases where global variables are still used, they have been grouped into global structures instead. This should make it easier to eliminate them entirely in the future. There is still plenty of room for improvement.
  • Refactored many functions to replace goto statements with more modern control structures.
  • Replaced many print statements with a logging facility that can be dialed down using a debug level.
  • Added functions for printing a summary of the existing sample data
  • Added functions for generating a JSON representation of the classification result equivalent to the data produced by the prclass command.
  • Added functions for generating class assignments to the sample items. Also, useful for classification through Python.
  • Fixed a bug in which during printing of Class information, the wrong class pointer was used to calculate one of the Factor costs. This is why the total costs for the Factor model did not match the sum of the parameter, data and variable costs. It also affected the display of the boost character on the class header information.
  • Added functions to allow loading vset and Sample data through function calls rather than from file. This allows vset and sample information to be supplied through Python ctypes calls.
  • Added functions to allow saving and loading of the population model to and from file. This allows the population model to be saved and loaded through Python ctypes calls.
  • Added a Scikit-learn style interface for fitting and predicting using the Factor Snob model. This allows the model to be used in a similar way to other Scikit-learn classification models. The features include the ability to save fitted models and to predict on new data based on previously saved models.
  • Added a python module that can provide the data, run the Factor Snob routines and extract the results within Python, using ctypes.

The original documentation file is included in the docs folder. It contains a lot of useful information about the original SNOB programs and may be useful for understanding some of the input and output parameters.

Any published work using this software should cite the original author's work as well as this repository. See the

The MML Book

[1] C. S. Wallace, Statistical and Inductive Inference by Minimum Message Length, Springer, isbn-13:978-0-387-23795-4, 2005.

Using the Python Module

The Python module can be installed using pip as follows:

pip install snob-factor

The module can be used in a similar way to other Scikit-learn classification models. The following example shows how to use the module to fit a Factor Snob model to some sample data and then use the model to predict the class of some new data.

import pandas as pd
import snob

# Load the sample data
train = pd.read_csv('train.csv')  
sfc = snob.SNOBClassifier(
    name='sst',
    attrs={                   # these are the features of the data
        'distance': 'real',   # a real-valued attribute
        'theta': 'radians',   # an angle in radians angles are treated specially
        'phi': 'radians',     # another angle in radians 
    },
    cycles=3,                 # maximum number of cycles to run
    steps=40,                 # maximum number of estimation/assignment steps to run
    moves=2,                  # maximum number of failed class shuffles before giving up each cycle
    tol=0.01,                 # minimum percentage improvement in cost to indicate convergence
    seed=1234567              # random number seed
)

# Fit the model to the sample data
sfc.fit(train)
sfc.save_model('sst.mod')     # save the model to a file for use later

classes = sfc.get_classes()   # get the class parameters for the fitted model
snob.show_classes(classes)    # display the classification summary

# get class assignments for training data
train_pred = sfc.predict()    # assignments the training data, note that predict is called without arguments
print(train_pred)             # train_pred is a pandas DataFrame with the class assignments

# Load some new data to predict
test = pd.read_csv('test.csv')

# Predict the class of the new data
test_pred = sfc.predict(test)
print(test_pred)

The SNOBClassifier class has a number of parameters that can be used to control the behaviour of the model. The attrs parameter is a dictionary that defines the features of the data. The keys are the names of the features and the values are the types of the features. The types can be one of the following:

  • 'real' - a real-valued feature
  • 'radians' - an angle in radians
  • 'degrees' - an angle in degrees
  • 'multi-state' - a categorical feature with more than two but preferably fewer than 20 states
  • 'binary' - a boolean or two-valued feature

Since SNOBClassifier is an unsupervised learning model, the fit() method does not require a target variable. The fit() method takes a pandas DataFrame as input. The DataFrame must contain columns for each of the features defined in the attrs parameter. The fit() method will run the Factor Snob algorithm on the data and produce a classification model for the data. After fitting, the classifier will be fully parameterized and can be used to predict the classes of new data.

The get_classes() method of a fully parameterized classifier can be used to get the class parameters for the fitted model. This returns a list of dictionaries with each dictionary representing one class. The show_classes() function from the snob package can be used to display a summary of the class parameters. The show_classes() function takes the list of class dictionaries as returned by the get_classes() method.

The model can be saved to a file using the save_model(). This method takes a single argument which is the name of the file to save the model to. A previously saved model, can be used by specifying a from_file parameter during initialization of the classifier. The from_file parameter should be set to the name of the file containing the saved model. The attrs parameter is always required even when a from_file parameter is provided. The attrs parameter should be the same as the one used to create the saved model.

The predict() method can then be used to predict the class of new data. The first time a restored model is used to for prediction, the model will be loaded into memory and used to fully parameterize the classifer before prections are performed. Details of class parameters will only be available after the classifer is fully parameterized.

The predict() method takes a pandas DataFrame as input. The DataFrame must contain columns for each of the features.

For example, the model above can be loaded from the saved model file and used as follows:

sfc = snob.SNOBClassifier(
    name='sst',
    attrs={                   # these are the features of the data
        'distance': 'real',   # a real-valued attribute
        'theta': 'radians',   # an angle in radians angles are treated specially
        'phi': 'radians',     # another angle in radians 
    },
    from_file='sst.mod'       # load the model from the file
)
new_data = pd.read_csv('new_data.csv')
new_pred = sfc.predict(new_data)    # No need to run fit again, the model will be loaded from the file

class_info = sfc.get_classes()      # must run predict first to fully parameterize the model
snob.show_classes(class_info)
print(new_pred)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snob_factor-2026.2.16.tar.gz (36.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

snob_factor-2026.2.16-cp314-cp314t-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

snob_factor-2026.2.16-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

snob_factor-2026.2.16-cp314-cp314-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

snob_factor-2026.2.16-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

snob_factor-2026.2.16-cp313-cp313-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

snob_factor-2026.2.16-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

snob_factor-2026.2.16-cp312-cp312-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

snob_factor-2026.2.16-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

snob_factor-2026.2.16-cp311-cp311-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

snob_factor-2026.2.16-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file snob_factor-2026.2.16.tar.gz.

File metadata

  • Download URL: snob_factor-2026.2.16.tar.gz
  • Upload date:
  • Size: 36.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for snob_factor-2026.2.16.tar.gz
Algorithm Hash digest
SHA256 49ddeb1caea58262ab89f1327264ee5a2b42b2300fa3dc1f08d9c7029d439993
MD5 424fa1c9e5b72dc18bc48104b0df9a19
BLAKE2b-256 0e08e8777a979874dc1f7fb0e248372ec9b84f36bd36ad4a7fc3885a07e308f3

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 aa170d6b643fb475642aa77a405f2bccf8d7b18b4c9b743dc8e05b5ec7e4408c
MD5 fc0444e9456c2008e44a2fd63b1bfda0
BLAKE2b-256 04e07f0ac16886be3ca01b98350b5b32b35818e6fe30521f86f0597643875574

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7146300c13c34b0ba99399e24ebc70795a00614c804d3dd73bc63b559fea1843
MD5 7ad629f76105e51a27ea35a5a43d095a
BLAKE2b-256 98250409817371dbe2e153d1b51b481c65ab63a69f3cbe6dce837f0fc68b4410

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8e649da651e31ea4a8f5a4310efb2033f01b0ad37ac56c05389b248d83983ba7
MD5 dc905af47886ff336fa340cc6374e749
BLAKE2b-256 7f4b5d2d781e5c079f8f50f2355340ba3a63217ac19ecbcc2736199e933c8c9a

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9c243bdb9ff22478586e4e72f3f756480efe739e68813a3ca937433715b7ef35
MD5 dc0362d98dc647b6fcb80b4b8b956f09
BLAKE2b-256 92977ce48ead4d263f8164aeef4d67b0c9b6fc84124c5361ef5bc25e3bf79c10

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 27e55902d484cf50edfe9023e63dc3063acae53c1a28e58eeeb6c2bf1883be06
MD5 332399f0ab2414807c7b6b8d75072d39
BLAKE2b-256 b0921de1450db619dd84650fc740112b8efc1739bff9629e1c0c9ce95c65be61

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 56a3e2f3cfe167341a26eb48020263cb358dc2d511aef8f3c09ee473e8fb17aa
MD5 388bbca2a030a7d78461a8166c7374d7
BLAKE2b-256 ebde6bc6fbdb296ccf7ec00f6355157279f93ec16b3ba8a1b331a991c436c15a

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2239a06f4cc5fd898943d32baa098db2807b51f8e364711220527170146a32c5
MD5 ed63b6c82d755c83663a4c34470cb6cf
BLAKE2b-256 0c3509e1a26ed8efca806a5f39ca82f68a63f4303c53150f2f3a883e8ac4a36d

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 915cda510acc2314ea743891cea4c40245b6ed05c1cd6705c0fcd5d7c7bcd448
MD5 66eaa98c325654050d91042c28cecc91
BLAKE2b-256 74c9b216077363e09f186f8e9506dd38cbc9abdcd8fe4b7a16ee9599ef6c9abd

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1be12434748835432d4136ff2d50d9d32ec151b1becc86d1b8d0d62ef0c9bc67
MD5 33338307cd6f697f9ea90800fbed5321
BLAKE2b-256 fa336257eb4ffbe1042865d51ddca7afd16706df143bd558b3c21ee8f759a623

See more details on using hashes here.

File details

Details for the file snob_factor-2026.2.16-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for snob_factor-2026.2.16-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9799bdedc2b0f4e22212378897a4c56c5d7dfbc6c4011ea1371ad9cf9e51cab4
MD5 0e3e6307e11b60bd768bae5099e97c35
BLAKE2b-256 40c365a654647af8738311178606dc673b3e9324430a728114ddb180f2e792eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page