Skip to main content

Package for Implementing MaCoDE

Project description

MaCoDE (accepted to AAAI 2025!)

MaCoDE is a novel distributional learning method by redefining the consecutive multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation.

For a detailed method explanations, check our paper! (link)

1. Installation

Install using pip:

pip install macode

2. Usage

from macode import macode
macode.MaCoDE # MaCoDE model

Example

import warnings
warnings.filterwarnings('ignore')

"""device setting"""
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

"""load dataset and specify column types"""
import pandas as pd
data = pd.read_csv('./whitewine.csv', delimiter=";")
columns = list(data.columns)
columns.remove("quality")
assert data.isna().sum().sum() == 0
continuous_features = columns
categorical_features = ["quality"]
integer_features = []

### the target column should be the last column
data = data[continuous_features + categorical_features] 
# len(data)

"""training, test, synthetic datasets"""
data[categorical_features] = data[categorical_features].apply(
    lambda col: col.astype('category').cat.codes + 1) # pre-processing

train = data.iloc[:4000]
test = data.iloc[4000:]
train = train.reset_index(drop=True)
test = test.reset_index(drop=True)

"""MaCoDE"""
from macode import macode

macode = macode.MaCoDE(
    data=train, # the observed tabular dataset
    continuous_features=continuous_features, # the list of continuous columns of data
    categorical_features=categorical_features, # the list of categorical columns of data
    integer_features=integer_features, # the list of integer-type columns of data
    
    seed=42, # seed for repeatable results
    bins=10, # the number of bins for discretization
    dim_transformer=512, # the embedding size (input dimension size of transformer)
    num_transformer_heads=8, # the number of heads in transformer
    num_transformer_layer=1, # the number of layers in transformer
    
    epochs=10, # the number of epochs (for quick checking)
    batch_size=1024, # the batch size
    lr=0.001, # learning rate
    device=device,
)

"""training"""
macode.train()

"""generate synthetic data"""
syndata = macode.generate_data(n=len(train), tau=1.)
syndata

"""Evaluate Synthetic Data Quality"""
from synthetic_eval import evaluation

target = "quality"
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. Citation

If you use this code or package, please cite our associated paper: (The final camera-ready version manuscript will be available soon.)

@inproceedings{an2025masked,
  title={Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis},
  author={An, Seunghwan and Woo, Gyeongdong and Lim, Jaesung and Kim, ChangHyun and Hong, Sungchul and Jeon, Jong-June},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={15},
  pages={15356--15364},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macode-1.0.1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

macode-1.0.1-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file macode-1.0.1.tar.gz.

File metadata

  • Download URL: macode-1.0.1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for macode-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d1f2024fba6aff89e0ba124a8161b6e60ebc0e12b643b3d722ffb37acc744322
MD5 439c37bbd7fac63fddc83df9cc9febc0
BLAKE2b-256 c366f42e700371935310de89d5e4398357d8b5534d609e5ee58fa7d4121d2765

See more details on using hashes here.

File details

Details for the file macode-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: macode-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for macode-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f19b3bbb05107f5f4b624b8cc8ea8a63d7e8b0da54c25d3003aef265ec1dd3d4
MD5 1f44a1d4db99f3a3fc95a7955eb278a0
BLAKE2b-256 5f229a678b6e7547da334f5b31fcebad97bf4584068b19efab1745b57073ce56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page