Skip to main content

DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data

Project description

DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data

Python License Neuroplasticity OPE PIGL DMA Backbone IDF Analyzer Model Conference Citation Status

DynaTab Architecture

DynaTab is a neuro-inspired tabular deep learning model for high-dimensional tabular data that tackles the Column Permutation Problem by dynamically reordering features instead of treating them as a fixed set. It predicts when feature ordering is beneficial using an intrinsic-dimensionality-based IDF/FOE criterion, then applies dynamic feature ordering (DFO) to rewire feature graphs and produce a task-aware global sequence. This reordered input is processed by an order-aware fusion block combining positional embeddings (OPE), importance gating (PIGL), and dynamic masked attention (DMA) on top of a sequential backbone (Transformer, DAE, LSTM, Mamba, or DAE-MHA-LSTM). It also empirically group tabular datasets into 5 categories. Across 36 real-world datasets and over 45 baselines, DynaTab achieves strong, statistically significant gains, particularly in high-dimensional low-sample-size (HDLSS) and other complex regimes, positioning dynamic feature ordering as a powerful paradigm for order-sensitive backbones in tabular deep learning for high-dimensional tabular data.

Citation

Al Zadid Sultan Bin Habib, Gianfranco Doretto, and Donald A. Adjeroh. “DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data.” In AAAI 2026 First International Workshop on Neuro for AI & AI for Neuro: Towards Multi-Modal Natural Intelligence (NeuroAI) Workshop Proceedings (PMLR), 2026.

Bibtex:

@inproceedings{habib2026dynatab,
  title     = {{DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data}},
  author    = {Habib, Al Zadid Sultan Bin and Doretto, Gianfranco and Adjeroh, Donald A.},
  booktitle = {Proceedings of the AAAI 2026 First International Workshop on Neuro for AI \& AI for Neuro: Towards Multi-Modal Natural Intelligence (NeuroAI)},
  year      = {2026},
  series    = {PMLR}
}

Files and Repository Structure

Python package: dynatab/

This folder contains the core DynaTab implementation (15 Python modules):

  • __init__.py - Package initializer and high-level API exports.
  • model.py - Main DynaTab model definition and wiring of all sub-modules.
  • dfo.py - Dynamic Feature Ordering (DFO) module and clustering/graph construction.
  • ope.py - Order-Aware Positional Embedding (OPE) implementation.
  • pigl.py - Positional Importance Gating Layer (PIGL).
  • dma.py - Dynamic Masked Attention (DMA) block.
  • seqprobinary.py - Training loop / utilities for binary classification.
  • seqpromulti.py - Training loop / utilities for multiclass classification.
  • seqproregression.py - Training loop / utilities for regression.
  • preprocess.py - Data preprocessing and tabular input utilities (splits, scaling, etc.).
  • metrics.py - Evaluation metrics and helper functions.
  • estimator.py - High-level estimator wrapper for running experiments (sklearn-style API).
  • idf_analyzer.py - Intrinsic Dimensionality Factor (IDF) + FOE analyzer: “Feature Ordering – When to Use?”.
  • customloss.py - Custom loss functions used by DynaTab.
  • trainer.py - Generic training / validation loop utilities shared across tasks.

Notebooks

  • DynaTab Dataset Complexity Analysis.ipynb
    Contains the experiments for the “Feature Ordering – When to Use?” section, including IDF / FOE computation across datasets.

  • DynaTab IDF Analyzer.ipynb
    Shows how to install/import the dynatab package and use TabularIDFAnalyzer to compute dataset complexity metrics with demo runs.
    The code cells illustrate how to use DynaTab to assess when feature ordering is appropriate for a given dataset.

  • DynaTab_Experiment1.ipynb
    Demonstrates how to use DynaTab for binary classification, multiclass classification, and regression, with or without Optuna-based hyperparameter tuning.

  • DynaTab_Experiment2.ipynb
    Demonstrates DynaTab on the GLI-85 HDLSS dataset for binary classification, without Optuna tuning, using Mamba or LSTM as the sequential processor backbone.

  • N.B.: Demo runs only contain less number of epochs or Optuna trials. For complete run, please use proper number of Optuna trials to search and find optimum hyperparameters.

Other top-level files

  • requirements.txt - Python dependencies required to run the DynaTab package and notebooks.
  • DynaTab_Architecture.jpg - High-level architecture diagram of the DynaTab framework.
  • LICENSE - MIT license for this repository.
  • README.md - Project overview, usage instructions, and citation information.
  • .gitignore - Standard Git ignore rules for Python and Jupyter projects.

Tested Environment

  • Python 3.8+
  • torch 2.5.1+cu121 (CUDA 12.1)
  • numpy 1.26.4
  • pandas 2.2.3
  • scikit-learn 1.5.2
  • matplotlib 3.10.0
  • scipy 1.11.4
  • kmeans_gpu 0.0.5

Recommended PyTorch install (GPU, CUDA 12.1)

pip install "torch==2.5.1+cu121" --index-url https://download.pytorch.org/whl/cu121

Installation

You can install DynaTab in several ways depending on your workflow.


Option 1: Clone the Repository (Recommended for Development)

git clone https://github.com/zadid6pretam/DynaTab.git
cd DynaTab
pip install -r requirements.txt
pip install -e .

Option 2: Install Directly from GitHub (No Cloning Needed)

pip install "git+https://github.com/zadid6pretam/DynaTab.git"

Option 3: Use a Virtual Environment

python -m venv dynatab-env
source dynatab-env/bin/activate  # On Windows: dynatab-env\Scripts\activate

git clone https://github.com/zadid6pretam/DynaTab.git
cd DynaTab
pip install -r requirements.txt
pip install -e .

Option 4: Local Install Without Editable Mode

git clone https://github.com/zadid6pretam/DynaTab.git
cd DynaTab
pip install -r requirements.txt
pip install .

Option 5: Install from PyPI (Planned)

pip install dynatab

Example Usage

Below are minimal examples for using DynaTab on standard binary, multiclass, and regression tasks.
For full HDLSS experiments and Optuna sweeps, see the accompanying Jupyter notebooks.


1. Binary Classification (Breast Cancer)

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from dynatab import (
    DynaTabClassifier,
    DFOConfig,
    TrainConfig,
    LossConfig,
)

# -----------------------------
# Data
# -----------------------------
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)  # 0/1 labels

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,
    random_state=42,
)

# -----------------------------
# DynaTab configs
# -----------------------------
dfo_cfg = DFOConfig(
    metric="manhattan",
    num_clusters=2,
    order="ascending",
    mutation_prob=0.0,
    tolerance=1e-3,
    seed=42,
)

train_cfg = TrainConfig(
    epochs=100,
    lr=1e-3,
    batch_size=256,
    print_every=20,
)

loss_cfg = LossConfig(
    loss_mode="DFO",      # "standard" | "dispersion" | "DFO"
    lambda_disp=0.0,
    lambda_global=0.0,
)

# -----------------------------
# Model: DynaTabClassifier
# -----------------------------
clf = DynaTabClassifier(
    task="binary",
    backbone="Transformer",   # or "LSTM", "DAE", "Mamba", ...
    embedding_dim=128,
    dfo_cfg=dfo_cfg,
    train_cfg=train_cfg,
    loss_cfg=loss_cfg,
    eval_metrics=["acc"],
    device=None,              # auto-selects CUDA/CPU
    standardize=True,         # train-only impute + standardize
)

clf.fit(X_train, y_train)
metrics = clf.score(X_test, y_test, metrics=["acc"])
print(f"Test Accuracy (Breast Cancer): {metrics['acc']:.4f}")

2. Multiclass Classification (Iris)

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from dynatab import (
    DynaTabClassifier,
    DFOConfig,
    TrainConfig,
    LossConfig,
)

data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)  # 3 classes: 0,1,2

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,
    random_state=42,
)

dfo_cfg = DFOConfig(
    metric="variance",
    num_clusters=3,
    order="descending",
    mutation_prob=0.1,
    tolerance=1e-3,
    seed=42,
)

train_cfg = TrainConfig(
    epochs=80,
    lr=1e-3,
    batch_size=64,
    print_every=20,
)

loss_cfg = LossConfig(
    loss_mode="standard",
    lambda_disp=0.0,
    lambda_global=0.0,
)

clf = DynaTabClassifier(
    task="multiclass",
    num_classes=3,
    backbone="Transformer",
    embedding_dim=64,
    dfo_cfg=dfo_cfg,
    train_cfg=train_cfg,
    loss_cfg=loss_cfg,
    eval_metrics=["acc"],
    device=None,
    standardize=True,
)

clf.fit(X_train, y_train)
metrics = clf.score(X_test, y_test, metrics=["acc"])
print(f"Test Accuracy (Iris): {metrics['acc']:.4f}")

3. Regression (Diabetes)

import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from dynatab import (
    DynaTabRegressor,
    DFOConfig,
    TrainConfig,
    LossConfig,
)

data = load_diabetes()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
)

dfo_cfg = DFOConfig(
    metric="correlation",
    num_clusters=3,
    order="ascending",
    mutation_prob=0.1,
    tolerance=1e-3,
    seed=42,
)

train_cfg = TrainConfig(
    epochs=120,
    lr=1e-3,
    batch_size=128,
    print_every=20,
)

loss_cfg = LossConfig(
    loss_mode="standard",   # for regression we typically keep it standard
    lambda_disp=0.0,
    lambda_global=0.0,
)

reg = DynaTabRegressor(
    backbone="Transformer",
    embedding_dim=64,
    dfo_cfg=dfo_cfg,
    train_cfg=train_cfg,
    loss_cfg=loss_cfg,
    eval_metrics=["r2"],    # e.g., R^2
    device=None,
    standardize=True,
)

reg.fit(X_train, y_train)
metrics = reg.score(X_test, y_test, metrics=["r2"])
print(f"Test R² (Diabetes): {metrics['r2']:.4f}")

4. Advanced: 5-Fold CV + Optuna Hyperparameter Tuning

For full HDLSS experiments, repeated CV, and Optuna-based tuning (Transformer, LSTM, DAE, Mamba backbones) on real datasets such as AI-d_case5, ADNI_AD123, GLI-85, and others, see:

  • DynaTab_Experiment1.ipynb – Binary & multiclass classification and regression (with / without Optuna-based hyperparameter tuning).
  • DynaTab_Experiment2.ipynb – HDLSS case studies (e.g., GLI-85 with Mamba/LSTM backbones).
  • DynaTab Dataset Complexity Analysis.ipynb and DynaTab IDF Analyzer.ipynb – Intrinsic dimensionality and “when to use feature ordering” analysis.
  • You can tweak the metrics / epochs / DFO settings if you want them lighter or closer to the paper defaults.

Previous Work: TabSeq

DynaTab builds on our earlier work on feature ordering for tabular data:

If you are interested in:

  • MHA-DAE-guided sequential tabular models,
  • Cluster-guided feature ordering, and
  • Baseline comparison to classical ML and other deep models,

please also refer to the TabSeq repository and its accompanying paper as the foundational precursor to DynaTab.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynatab-0.1.0.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynatab-0.1.0-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file dynatab-0.1.0.tar.gz.

File metadata

  • Download URL: dynatab-0.1.0.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dynatab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d38b27d65ac01c7f64272be65bdcaba3fbd23052cf21c196595ce173f45e81b2
MD5 a1fee03cd40942247631a60e6aca1e28
BLAKE2b-256 45aa66c61278f0f9528e5d7a794821543dcdd6417bb59e57160f46418eed46c7

See more details on using hashes here.

File details

Details for the file dynatab-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dynatab-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dynatab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c3456ce4f603c3a45a85539f40a583288cd0f4f30ab5dfca786d64708200f0d
MD5 04f33378b9fc304cc7b57511e50e9a71
BLAKE2b-256 09738a2c99cb5d129bda3259ce0d0f310df31c0a7d9e56088892cec281e93c80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page