A Python package for stepwise estimation of latent class models with measurement and structural components. The package can also be used to fit mixture models with various observed random variables.
Project description
StepMix
For StepMixR, please refer to this repository.
A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods based on pseudolikelihood theory. Additional features include support for covariates and distal outcomes, various simulation utilities, and non-parametric bootstrapping, which allows inference in semi-supervised and unsupervised settings.
Reference
If you find StepMix useful, please consider citing our arXiv preprint:
@article{morin2023stepmix,
title={StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables},
author={Morin, Sacha and Legault, Robin and Bakk, Zsuzsa and Gigu{\`e}re, Charles-{\'E}douard and de la Sablonni{\`e}re, Roxane and Lacourse, {\'E}ric},
journal={arXiv preprint arXiv:2304.03853},
year={2023}
}
Install
You can install StepMix with pip, preferably in a virtual environment:
pip install stepmix
Quickstart
A simple StepMix mixture using the continuous variables of the Iris Dataset:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.metrics import rand_score
from stepmix.stepmix import StepMix
# Load dataset in a Dataframe
data_continuous, target = load_iris(return_X_y=True, as_frame=True)
# Continuous StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="continuous", verbose=0, random_state=123)
# Fit model and predict clusters
model.fit(data_continuous)
pred_continuous = model.predict(data_continuous)
# A Rand score close to 1 indicates good alignment between clusters and flower types
print(rand_score(pred_continuous, target))
StepMix also provides support for categorical mixtures:
# Create categorical data based on the Iris Dataset quantiles
data_categorical = data_continuous.copy()
for col in data_categorical:
data_categorical[col] = pd.qcut(data_continuous[col], q=3).cat.codes
# Categorical StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="categorical", verbose=0, random_state=123)
# Fit model and predict clusters
model.fit(data_categorical)
pred_categorical = model.predict(data_categorical)
# A Rand score close to 1 indicates good alignment between clusters and flower types
print(rand_score(pred_categorical, target))
Please refer to the StepMix tutorials to learn how to handle missing values and combine continuous and categorical data in the same model.
Tutorials
Detailed tutorials are available in notebooks:
- Generalized Mixture Models with StepMix:
an in-depth look at how latent class models can be defined with StepMix. The tutorial uses the Iris Dataset as an example
and covers:
- Continuous LCA models (latent profile analysis/gaussian mixture model);
- Binary LCA models;
- Categorical LCA models;
- Mixed variables mixture models (continuous and categorical data);
- Missing Values through Full-Information Maximum Likelihood.
- Stepwise Estimation with StepMix:
a tutorial demonstrating how to define measurement and structural models. The tutorial discusses:
- LCA models with distal outcomes;
- LCA models with covariates;
- 1-step, 2-step and 3-step estimation;
- Corrections (BCH or ML) and other options for 3-step estimation.
- Model Selection:
a short tutorial discussing:
- Selecting the number of components in a mixture model (
n_components
); - Comparing models with fit indices: AIC and BIC.
- Selecting the number of components in a mixture model (
- Parameters, Bootstrapping and CI:
a tutorial discussing how to:
- Access StepMix parameters;
- Bootstrap StepMix estimators;
- Quickly plot confidence intervals.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.