A package for missing data imputation

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

dkatsimpokis

These details have not been verified by PyPI

Project description

MissEnsemble

MissEnsemble is a generalization of the popular MissForest algorithm (Stekhoven et al., 2012) for missing value imputation. It extends MissForest by supporting multiple ensemble methods and provides a scikit-learn compatible API. Currently supported ensemble methods:

Random Forests
XGBoost

MissEnsemble natively handles different types of input values (e.g., strings, numbers, etc). You only need to specify which column names belong to which variable type (numerical, categorical, or ordinal).

In addition, MissEnsemble provides built-in visualization functions for convergence and imputation validation (when true values are available).

Setup

Install from PyPI:

pip install missensemble

Usage Example

You must specify whether each column in your DataFrame is categorical, ordinal, or numerical. This ensures the imputation method treats each variable appropriately. Assign every column to one of these types. Example with a DataFrame of five variables in total:

import numpy as np
import pandas as pd
from missensemble import MissEnsemble

# Create example dataframe (100 x 5)
data = pd.DataFrame({
    "col1": np.random.choice(['A', 'B', 'C'], size=100),
    "col2": np.random.choice(['X', 'Y'], size=100),
    "col3": np.random.randint(1, 5, size=100),
    "col4": np.random.randn(100),
    "col5": np.random.randn(100)
})

# Create NAs for col1 and col4
for col in ['col1', 'col4']:
    to_be_nas = data.sample(30)  # 30 values missing at random
    to_be_nas[col] = np.nan
    data.loc[to_be_nas.index] = to_be_nas

# Initialize the MissEnsemble class
estimator = MissEnsemble(
    categorical_vars=['col1', 'col2'],
    ordinal_vars=['col3'],
    numerical_vars=['col4', 'col5'],
)

# Fit and transform the data
imputed_data = estimator.fit_transform(data)

For an extended usage example, see the example.ipynb notebook.

Parameters

The MissEnsemble class accepts the following parameters:

n_iter (int): Number of iterations to perform for imputation.
categorical_vars (list of str): List of column names representing categorical variables.
ordinal_vars (list of str): List of column names representing ordinal variables.
numerical_vars (list of str): List of column names representing numerical variables.
ens_method (str, optional): Ensemble method to use for imputation. Default is 'forest'. 'xgb' also supported.
n_estimators (int, optional): Number of estimators to use in the ensemble method. Default is 100.
tol (float, optional): Tolerance for convergence. Default is 1e-4.
random_state (int, optional): Random state for reproducibility. Default is 42.
print_criteria (bool, optional): Whether to print the imputation criteria during fitting. Default is False.

If the converge criterion change is lower than tol for three rounds, the algorithm terminates earlier.

Requirements

MissEnsemble requires Python 3.11+ and the following packages:

numpy
pandas
scikit-learn
xgboost
seaborn
matplotlib

The requirements are taken care of by pip automatically during the installation of the package.

Parameter specification of `MissEnsemble`

Supported Ensemble Methods

You can select the ensemble method using the ens_method parameter:

ens_method='forest' for Random Forests (default)
ens_method='xgb' for XGBoost

Error Handling

Each column must be assigned to exactly one variable type: categorical, ordinal, or numerical.
If a column is assigned to multiple types or omitted, MissEnsemble will raise an error.

API Reference

The MissEnsemble class inherits from the scikit-learn API. Public methods:

fit(X): Fit the imputer to the data.
transform(X): Impute missing values in new data.
fit_transform(X): Fit and transform in one step.
plot_criteria(plot_final=False): Visualize convergence criteria.
check_imputation_fit(var_name, true_values, error_type, plot_type): Visualize and assess imputation quality.

Visualization Methods

MissEnsemble offers visualization functionalities for convergence and imputation checks (the latter only if true values are available).

Convergence Criteria

After fitting, use the plot_criteria method to show the minimization path of the stopping criteria:

estimator.plot_criteria(plot_final=False)

which results in the following plot:

imputation criteria

Imputation check

The check_imputation_fit method plots divergence of the imputed values as compared to the true values. In the following code, we check the imputation of mean texture (see example.ipynb notebook):

estimator.check_imputation_fit(
    var_name='mean texture',
    true_values=data.loc[:, 'mean texture'],
    error_type='std_diff',
    plot_type='hist'
)

which results in the following plot:

imputation check

Different divergence and plot types are offered in this method.

Contact

For questions or support, please open an issue on GitHub.

Literature

Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

dkatsimpokis

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Mar 19, 2026

0.1.9

Oct 15, 2025

0.1.8

Aug 11, 2025

0.1.7

Jul 26, 2025

0.1.6

Jul 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

missensemble-0.2.0.tar.gz (15.7 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

missensemble-0.2.0-py3-none-any.whl (15.5 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file missensemble-0.2.0.tar.gz.

File metadata

Download URL: missensemble-0.2.0.tar.gz
Upload date: Mar 19, 2026
Size: 15.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for missensemble-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`089eb74b89e1639d6fc69a40a1933e11cb7c49abb20fec13b1f82ad25b90bf95`
MD5	`0c39be5db95b7b8c8c3b816aec5e52c7`
BLAKE2b-256	`01fe142767a737818bad0a59c17adaf5a5871d9c1c0f12da4464ab92352a050b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for missensemble-0.2.0.tar.gz:

Publisher: release.yml on dkatsimpokis/MissEnsemble

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: missensemble-0.2.0.tar.gz
- Subject digest: 089eb74b89e1639d6fc69a40a1933e11cb7c49abb20fec13b1f82ad25b90bf95
- Sigstore transparency entry: 1133598352
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: dkatsimpokis/MissEnsemble@02eec2f141958b6206d446f8e1f2cd808cfa2947
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dkatsimpokis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@02eec2f141958b6206d446f8e1f2cd808cfa2947
- Trigger Event: push

File details

Details for the file missensemble-0.2.0-py3-none-any.whl.

File metadata

Download URL: missensemble-0.2.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for missensemble-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`334c4ad4cdaf489d64ee301a18eccea25a85d2890dcc6a0825d65ec472dacf2f`
MD5	`4c6e2eac96672cfb32412498e32398e2`
BLAKE2b-256	`b25bb45c9f3a8ada4e36d5b8fa517e837a4f1b06bbed1f2f850c3f060bc784b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for missensemble-0.2.0-py3-none-any.whl:

Publisher: release.yml on dkatsimpokis/MissEnsemble

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: missensemble-0.2.0-py3-none-any.whl
- Subject digest: 334c4ad4cdaf489d64ee301a18eccea25a85d2890dcc6a0825d65ec472dacf2f
- Sigstore transparency entry: 1133598978
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: dkatsimpokis/MissEnsemble@02eec2f141958b6206d446f8e1f2cd808cfa2947
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dkatsimpokis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@02eec2f141958b6206d446f8e1f2cd808cfa2947
- Trigger Event: push

missensemble 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MissEnsemble

Setup

Usage Example

Parameters

Requirements

Parameter specification of MissEnsemble

Supported Ensemble Methods

Error Handling

API Reference

Visualization Methods

Convergence Criteria

Imputation check

Contact

Literature

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Parameter specification of `MissEnsemble`