Skip to main content

AURAI: A uniquely hybrid imputation model that unifies mask-aware variational autoencoding, latent-neighbor correction, and adaptive feature gating to deliver uncertainty-aware reconstruction of missing data.

Project description

AURAI – Adaptive Uncertainty-Regularized Autoencoder Imputer

Author: Abdul Mofique Siddiqui
License: MIT
Install via pip:

pip install aurai-imputer

Import it in your Python code:

from AURAI import AURAIImputer

Overview

AURAI (Adaptive Uncertainty-Regularized Autoencoder Imputer) is an advanced hybrid imputation framework that combines:

  • A mask-aware Variational Autoencoder (VAE)
  • Latent-space nearest-neighbor refinement
  • A feature-wise adaptive gating mechanism
  • Monte-Carlo–based uncertainty estimation

AURAI supports both numerical and categorical datasets and performs reliably under:

  • MCAR (Missing Completely At Random)
  • MAR (Missing At Random)
  • MNAR (Missing Not At Random)

The imputer also produces confidence intervals for each filled value, making it suitable for decision-critical applications.

Installation

Install the package via pip:

pip install aurai-imputer

How It Works

  • Global VAE Module Learns latent structure and reconstructs both numeric and categorical distributions.
  • Latent-Space KNN Module Uses nearest neighbors in latent space to refine local predictions.
  • Adaptive Gating Produces a learnable per-feature weight that blends global (VAE) and local (KNN) imputations.
  • Uncertainty Estimation Monte-Carlo sampling over latent variables yields:
    • Posterior means
    • 95% confidence intervals
  • Mixed Data Support Uses StandardScaler + OrdinalEncoder to handle mixed data seamlessly.

Getting Started

1. Import the package

from AURAI import AURAIImputer

2. Initialize the imputer

imputer = AURAIImputer()

3. Fit the model

imputer.fit(df)
  • df: pandas DataFrame containing numerical and/or categorical columns

4. Impute missing values

imputed = imputer.transform(df)

Returns a NumPy array with missing values filled.

5. Impute with uncertainty intervals

mean, lower, upper = imputer.transform(df, return_intervals=True)

API Reference

AURAIImputer()

Initializes the imputer. Supports optional parameters such as latent dimension, Monte Carlo samples, neighbors count, etc.

.fit(df)

Fits the model to training data.

Parameters:

  • df: pandas DataFrame with mixed features

.transform(df, return_intervals=False)

Returns imputed values.

Input:

  • df: DataFrame or numpy array with missing values

Output:

  • A NumPy array with imputed values
  • If return_intervals=True: returns (mean, lower, upper)

.save(path)

Saves:

  • model weights
  • preprocessor
  • metadata

.load(path)

Loads a previously saved AURAI model.

Example Usage

Example 1: Basic Imputation

from AURAI import AURAIImputer
import pandas as pd

df = pd.read_csv("data.csv")
imputer = AURAIImputer()
imputer.fit(df)
imputed = imputer.transform(df)

Example 2: Imputation with Uncertainty

mean, lower, upper = imputer.transform(df, return_intervals=True)

Example 3: Demo

import numpy as np
import pandas as pd
import os
import shutil

from AURAI import AURAIImputer  


# ============================================
# Example demo
# ============================================
def run_demo():
    print("[Example] Running AURAIImputer quick demo...")

    # Create synthetic demo dataset
    np.random.seed(42)
    N = 400
    age = np.random.randint(18, 70, N)
    income = age * 1200 + np.random.randn(N) * 5000
    job = np.random.choice(["eng", "sales", "hr", "dev"], N)
    score = income / 800 + np.random.randn(N) * 3

    df = pd.DataFrame({
        "age": age,
        "income": income,
        "job": job,
        "score": score
    })

    # Introduce 20% missingness
    rng = np.random.default_rng(42)
    df_missing = df.mask(rng.random(df.shape) < 0.2)

    print("\nMissing% per col:\n", df_missing.isnull().mean())

    # Initialize imputer
    imputer = AURAIImputer(
        latent_dim=32,
        mc_samples=100,
        faiss_enabled=False,
        verbose=True,
        min_latent_std=1e-2,
        min_num_std=1e-2
    )

    # Fit the model
    imputer.fit(df_missing, epochs=10, batch_size=128, lr=1e-3)

    # Perform imputation with intervals and decoded DataFrame output
    final_df, lower, upper = imputer.transform(
        df_missing,
        return_intervals=True,
        return_df=True
    )

    print("\nFirst 5 rows of decoded final imputed DataFrame:")
    print(final_df.head())

    # Check interval degeneracy
    mean_arr, low_arr, high_arr = imputer.transform(df_missing, return_intervals=True)
    print("\nZero-width intervals:", np.sum(np.isclose(low_arr, high_arr)), "/", low_arr.size)

    # Save/load test
    save_dir = "aurai_demo_saved"
    if os.path.exists(save_dir):
        shutil.rmtree(save_dir)

    imputer.save(save_dir)
    imputer2 = AURAIImputer.load(save_dir)

    print("\nSave & load smoke test OK:", isinstance(imputer2, AURAIImputer))
    print("\n[Example] Demo finished.")


# Only run demo when file is executed directly
if __name__ == "__main__":
    run_demo()

Internals

  • Variational Autoencoder (VAE) Learns global structure and reconstructs numeric means, variances, and categorical logits.
  • Latent-Space Nearest Neighbor Search Provides local refinement to improve imputation accuracy.
  • Gating Network Learns per-feature blending weights for global + local fusion.
  • Cluster Regularization Encourages structured and stable latent geometry.
  • Monte Carlo Sampling Produces mean predictions and confidence intervals.

Notes

  • Works with both numeric and categorical data.
  • Performs well under MCAR, MAR, and MNAR.
  • Provides uncertainty intervals for downstream tasks.
  • GPU recommended for training large datasets.

Author

Abdul Mofique Siddiqui

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aurai_imputer-1.0.9.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aurai_imputer-1.0.9-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file aurai_imputer-1.0.9.tar.gz.

File metadata

  • Download URL: aurai_imputer-1.0.9.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for aurai_imputer-1.0.9.tar.gz
Algorithm Hash digest
SHA256 fb54f575cec0faf392c23667699a861fa0ccfa8fb8128bcbed29247f9b250705
MD5 54fc84cf6c56d664deefff908937d3a6
BLAKE2b-256 24105728c97042f5c9d5288b48fb20daa501f4086f0aec56f0fe4efd5c35a03f

See more details on using hashes here.

File details

Details for the file aurai_imputer-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: aurai_imputer-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for aurai_imputer-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6981b2d1c3733c9e4f0c3c9f2cff449d34916034600d1c823f0a1eb27609b4da
MD5 c7c9bc79ea778374ebb2d181e7bb9526
BLAKE2b-256 f79e25552d5eba3cf61b6d6bae54aff197cfe1e7bc351613151ab25e149ba995

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page