Skip to main content

A Python package for stepwise estimation of latent class models with measurement and structural components. The package can also be used to fit mixture models with various observed random variables.

Project description

StepMix

PyPI version Build Documentation Status Code style: black Downloads Downloads arXiv

For StepMixR, please refer to this repository.

A Python package following the scikit-learn API for generalized mixture modeling. The package supports categorical data (Latent Class Analysis) and continuous data (Gaussian Mixtures/Latent Profile Analysis). StepMix can be used for both clustering and supervised learning.

Additional features include:

  • Support for missing values through Full Information Maximum Likelihood (FIML);
  • Multiple stepwise Expectation-Maximization (EM) estimation methods based on pseudolikelihood theory;
  • Covariates and distal outcomes;
  • Parametric and non-parametric bootstrapping.

Reference

If you find StepMix useful, please consider citing our arXiv preprint:

@article{morin2023stepmix,
  title={StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables},
  author={Morin, Sacha and Legault, Robin and Lalibert{\'e}, F{\'e}lix and Bakk, Zsuzsa and Gigu{\`e}re, Charles-{\'E}douard and de la Sablonni{\`e}re, Roxane and Lacourse, {\'E}ric},
  journal={arXiv preprint arXiv:2304.03853},
  year={2023}
}

Install

You can install StepMix with pip, preferably in a virtual environment:

pip install stepmix

Quickstart

A StepMix mixture using categorical variables on a preloaded data matrix. StepMix accepts either numpy.arrayor pandas.DataFrame. Categories should be integer-encoded and 0-indexed.

from stepmix.stepmix import StepMix

# Categorical StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="categorical")
model.fit(data)

# Allow missing values
model_nan = StepMix(n_components=3, measurement="categorical_nan")
model_nan.fit(data_nan)

For binary data you can also use measurement="binary" or measurement="binary_nan". For continuous data, you can fit a Gaussian Mixture with diagonal covariances using measurement="continuous" or measurement="continuous_nan".

Set verbose=1 for a detailed output.

Please refer to the StepMix tutorials to learn how to combine continuous and categorical data in the same model.

Tutorials

Detailed tutorials are available in notebooks:

  1. Generalized Mixture Models with StepMix: an in-depth look at how mixture models can be defined with StepMix. The tutorial uses the Iris Dataset as an example and covers:
    1. Gaussian Mixtures (Latent Profile Analysis);
    2. Binary Mixtures (LCA);
    3. Categorical Mixtures (LCA);
    4. Mixed Categorical and Continuous Mixtures;
    5. Missing Values through Full-Information Maximum Likelihood.
  2. Stepwise Estimation with StepMix: a tutorial demonstrating how to define measurement and structural models. The tutorial discusses:
    1. LCA models with distal outcomes;
    2. LCA models with covariates;
    3. 1-step, 2-step and 3-step estimation;
    4. Corrections (BCH or ML) and other options for 3-step estimation;
    5. Putting it All Together: A Complete Model with Missing Values
  3. Model Selection:
    1. Selecting the number of components in a mixture model (n_components) with cross-validation;
    2. Selecting the number of components with the Parametric Bootstrapped Likelihood Ratio Test (BLRT);
    3. Fit indices: AIC, BIC and other metrics.
  4. Parameters, Bootstrapping and CI: a tutorial discussing how to:
    1. Access StepMix parameters;
    2. Bootstrap StepMix estimators;
    3. Quickly plot confidence intervals.
  5. Supervised and Semi-Supervised Learning with StepMix:
    1. Binary Classification;
    2. Multiclass Classification;
    3. Semi-Supervised Learning;
    4. Cross-Validation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stepmix-2.2.1.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

stepmix-2.2.1-py3-none-any.whl (44.2 kB view details)

Uploaded Python 3

File details

Details for the file stepmix-2.2.1.tar.gz.

File metadata

  • Download URL: stepmix-2.2.1.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for stepmix-2.2.1.tar.gz
Algorithm Hash digest
SHA256 8157fe272a5d0df0070ce1745557aac183862b10fccda591fe28a564b890e544
MD5 466b38e4fd9e0b326aee72ae0a16406a
BLAKE2b-256 846e9a3032c734b8a13060ee72a778843f5e7e10e56813c59f58be4c0128b44a

See more details on using hashes here.

File details

Details for the file stepmix-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: stepmix-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 44.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for stepmix-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8dd4a1426d9bd8d514b49a668c624bb8049bb686354ea2ad534f05e1229517c
MD5 0477af8f711521222bb5a3b3bfe69e17
BLAKE2b-256 f82382f2483b7f5d440217a785622983c2bb894acbdeb4b6c5d170132d16f065

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page