Feature Ordering Module from TabSeq (ICPR 2024)

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Python License Sequencing Backbone Model Conference Status

TabSeq is a cutting-edge framework designed to bridge the gap in applying deep learning to tabular datasets, which often has feature heterogeneous and sequential characteristics. By leveraging feature ordering, TabSeq organizes features to maximize their relevance and interactions, significantly improving the model's ability to learn from tabular data.

The framework incorporates:

Clustering to group features with similar characteristics in feature ordering.
Multi-Head Attention (MHA) to prioritize essential feature interactions.
Denoising Autoencoder (DAE) to reduce redundancy and reconstruct noisy inputs.

TabSeq has demonstrated remarkable performance across various real-world datasets, outperforming traditional methods. Its modular design and adaptability make it a powerful tool for both binary and multi-class classification tasks, addressing challenges in health informatics, financial modeling, and more.

Explore the potential of TabSeq and see how it transforms deep learning on tabular data.

Files

TabSeq_arxiv.pdf: Research paper (pre-print) describing the framework.
binary.py: Implementation for binary classification tasks.
multiclass.py: Implementation for multi-class classification tasks.

Requirements

Python 3.8+
numpy, pandas, scikit-learn, tensorflow, networkx

Citation

Al Zadid Sultan Bin Habib, Kesheng Wang, Mary-Anne Hartley, Gianfranco Doretto, and Donald A. Adjeroh. "TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering." In International Conference on Pattern Recognition (ICPR), 2024, pp. 418–434. Springer.

BibTeX:

@inproceedings{habib2024tabseq,
  title={TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering},
  author={Habib, Al Zadid Sultan Bin and Wang, Kesheng and Hartley, Mary-Anne and Doretto, Gianfranco and A. Adjeroh, Donald},
  booktitle={International Conference on Pattern Recognition},
  pages={418--434},
  year={2024},
  organization={Springer}
}

Installation

You can install TabSeq in multiple ways depending on your use case:

Option 1: Clone the Repository (Recommended for Development)

git clone https://github.com/zadid6pretam/TabSeq.git
cd TabSeq
pip install -r requirements.txt
pip install -e .

Option 2: Install via pip from GitHub (No Cloning Needed)

pip install git+https://github.com/zadid6pretam/TabSeq.git

Option 3: Install in a Virtual Environment

python -m venv tabseq-env
source tabseq-env/bin/activate  # On Windows: tabseq-env\Scripts\activate
git clone https://github.com/zadid6pretam/TabSeq.git
cd TabSeq
pip install -r requirements.txt
pip install -e .

Option 4: Manual Install Using setup.py

git clone https://github.com/zadid6pretam/TabSeq.git
cd TabSeq
pip install .

Option 5: Install from PyPI

pip install TabSeq

Example Usage

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tabseq.binary import train_binary_model
from tabseq.multiclass import train_multiclass_model

# Generate synthetic dataset
X = np.random.rand(40, 80)                   # 40 samples, 80 features
y_binary = np.random.randint(0, 2, 40)       # Binary labels (0 or 1)
y_multiclass = np.random.randint(0, 3, 40)   # Multiclass labels (0, 1, 2)

# Scale features
X_scaled = pd.DataFrame(StandardScaler().fit_transform(X))

# Split into train, valid, test
X_train, X_temp, y_train_b, y_temp_b = train_test_split(X_scaled, y_binary, test_size=0.4, stratify=y_binary)
X_valid, X_test, y_valid_b, y_test_b = train_test_split(X_temp, y_temp_b, test_size=0.5, stratify=y_temp_b)

_, X_temp, y_train_m, y_temp_m = train_test_split(X_scaled, y_multiclass, test_size=0.4, stratify=y_multiclass)
X_valid_m, X_test_m, y_valid_m, y_test_m = train_test_split(X_temp, y_temp_m, test_size=0.5, stratify=y_temp_m)

# Run TabSeq for Binary Classification
train_binary_model(X_train, X_valid, X_test, y_train_b, y_valid_b, y_test_b)

# Run TabSeq for Multi-Class Classification
train_multiclass_model(X_train, X_valid, X_test, y_train_m, y_valid_m, y_test_m, num_classes=3)

Default Parameter Values for Binary Classification

# =======================================================
# TabSeq Default Configuration Parameters (Binary Version)
# =======================================================
# Feature Ordering:
# - num_clusters: 5 (KMeans clustering is applied to transpose of feature matrix)
# - Intra-cluster ordering: Features sorted in descending order of variance
# - Global ordering: Integrated from local orderings using variance-based random weights

# Autoencoder (Denoising with Attention):
# - Noise: Gaussian noise with std = 0.1 added before training, clipped to [0, 1]
# - Attention Heads: 4
# - Attention Head Dimension (dk): 64
# - Dropout Rate in Attention: 0.1
# - Epochs: 50
# - Batch Size: 32
# - Loss Function: Mean Squared Error
# - Optimizer: Adam
# - EarlyStopping: patience = 5, monitor = 'val_loss', restore_best_weights = True

# Classifier:
# - Architecture: [Dense(128, relu) → BN → Dropout(0.5) → Dense(64, relu) → BN → Dropout(0.5) → Dense(1, sigmoid)]
# - Epochs: 50
# - Batch Size: 32
# - Loss Function: Binary Crossentropy
# - Metric: Accuracy
# - EarlyStopping: patience = 5, monitor = 'val_loss', restore_best_weights = True

Default Parameter Values for Multiclass Classification

# ===============================================
# TabSeq Default Configuration (Multiclass Version)
# ===============================================

# Feature Ordering:
# - num_clusters: 5 (KMeans clustering on transposed feature matrix)
# - Intra-cluster ordering: Features sorted by descending variance
# - Global ordering: Weighted integration of local orderings based on random-scaled variances

# Denoising Autoencoder with Multihead Attention:
# - Noise: Gaussian noise with std = 0.1, clipped between [0, 1]
# - Attention Heads: 4
# - Head Dimension (dk): 64
# - Dropout Rate in Attention: 0.1
# - Encoder: Dense(128 → 64), BatchNorm, Dropout(0.2)
# - Decoder: Dense(input_dim, sigmoid)
# - Epochs: 50
# - Batch Size: 32
# - Loss Function: Mean Squared Error
# - Optimizer: Adam
# - EarlyStopping: patience = 5, monitor = 'val_loss'

# Classifier:
# - Architecture: [Dense(128, relu) → BN → Dropout(0.5) → Dense(64, relu) → BN → Dropout(0.5) → Dense(num_classes, softmax)]
# - Loss Function: Categorical Crossentropy
# - Metric: Accuracy
# - Epochs: 50
# - Batch Size: 32
# - EarlyStopping: patience = 5, monitor = 'val_loss'

# Evaluation:
# - AUC: macro-average, using one-vs-rest (ovr)
# - Classification report: includes precision, recall, F1 for each class

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.6

May 27, 2025

0.1.5

May 27, 2025

0.1.4

May 27, 2025

0.1.3

May 27, 2025

0.1.2

May 27, 2025

0.1.1

May 27, 2025

This version

0.1.0

May 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabseq_feature_ordering-0.1.0.tar.gz (6.4 kB view details)

Uploaded May 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tabseq_feature_ordering-0.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded May 27, 2025 Python 3

File details

Details for the file tabseq_feature_ordering-0.1.0.tar.gz.

File metadata

Download URL: tabseq_feature_ordering-0.1.0.tar.gz
Upload date: May 27, 2025
Size: 6.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tabseq_feature_ordering-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`98132d8630ecedd1ed96913a5680cebae67f2b09deec5ac34ef4b0fbf9f5622a`
MD5	`8149a28536d4e94f209a72ad1bf36f0d`
BLAKE2b-256	`8b0229e64d70c7c2e0cf6fa540330f39b8798b87af33225225c5549b714f9b9b`

See more details on using hashes here.

File details

Details for the file tabseq_feature_ordering-0.1.0-py3-none-any.whl.

File metadata

Download URL: tabseq_feature_ordering-0.1.0-py3-none-any.whl
Upload date: May 27, 2025
Size: 6.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tabseq_feature_ordering-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b9950e280588bde8405b29eb2a1596f7d159858b53aa63b68277fc3fcc8235a`
MD5	`727f896c4a235ac0c6f2e0b385a68180`
BLAKE2b-256	`561196e429012ff29adffc293375aae0caf5ce6c80568be221c6af96a5de5f01`

See more details on using hashes here.

tabseq-feature-ordering 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Files

Requirements

Citation

Installation

Option 1: Clone the Repository (Recommended for Development)

Option 2: Install via pip from GitHub (No Cloning Needed)

Option 3: Install in a Virtual Environment

Option 4: Manual Install Using setup.py

Option 5: Install from PyPI

Example Usage

Default Parameter Values for Binary Classification

Default Parameter Values for Multiclass Classification

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes