Skip to main content

Synthetic data generation using KAN-enhanced CTGAN/TVAE architectures

Project description

Kolmogorov–Arnold Networks for Tabular Data Synthesis

KAN_synth is an open-source Python package for generating high-fidelity synthetic tabular data using Kolmogorov–Arnold Networks (KANs). It extends the original CTGAN and TVAE models from the Synthetic Data Vault (SDV) by replacing their MLP-based architectures with KAN-based components.

This project was developed for research (undergraduate thesis) and practical evaluation of KAN-based generative models, particularly in comparison to traditional GANs and VAEs on tabular data synthesis tasks.

Overview

The KAN_synth repository is structured into several components:

Models (models/)

KAN-based generative models for tabular data:

  • KAN_CTGAN: Fully KAN-based implementation of CTGAN.
  • HYBRID_KAN_CTGAN: CTGAN with only the first layer replaced by KAN.
  • Disc_KAN_CTGAN: CTGAN with a KAN-based discriminator only.
  • Gen_KAN_CTGAN: CTGAN with a KAN-based generator only.
  • KAN_TVAE: Full KAN-based variant of TVAE.
  • HYBRID_KAN_TVAE: TVAE where one intermediate layer is replaced by a KAN block.

Evaluation Scripts (benchmarks/)

Non-reusable scripts developed to:

  • Train and evaluate each model on various datasets.
  • Measure similarity and ML utility.
  • Aggregate and visualize results.

Data Generation Scripts (data_gen/)

Standalone scripts for generating synthetic datasets using each model for internal benchmarking and experimentation purposes.

Utilities (utilities/)

Functions to compute:

  • Overall similarity between real and synthetic datasets
  • Model evaluation metrics (MAE, RMSE, R² for regression; Accuracy, F1, etc. for classification)
  • Visualizations of ML utility scores across models

Tests (test/)

Unit tests for model importability, synthetic generation routines, and ML pipeline sanity checks using pytest.

The full model descriptions can be found in the thesis work at: [Not yet published].

Installation

Once the package is published on PyPI, it can be installed via pip:

pip install kan-synth

All required packages are listed in requirements.txt, and will be automatically installed when using:

pip install -r requirements.txt

Since these models are based on the original CTGAN and TVAE architectures, the same data preprocessing principles apply:

  • Continuous columns must be represented as floats.
  • Discrete columns must be represented as integers or strings.
  • The input dataset should not contain any missing values.

For additional details, refer to the original CTGAN repository: https://github.com/sdv-dev/CTGAN

Local Development Installation

if you'd like to install the package locally for development or experimentation:

git clone https://github.com/cris1618/KAN_synth.git
cd KAN_synth
pip install -e .

The -e flag installs the package in "editable" mode, meaning any changes you make to the code will immediately affect the installed version without needing to reinstall.

Usage Example

Here's a minimal working example to train a KAN-based synthesizer and generate synthetic tabular data using the KAN_CTGAN model.

from KAN_synth import KAN_CTGAN
import pandas as pd

# Load real tabular data (pandas DataFrame)
df = pd.read_csv("your_dataset.csv")

# Define the discrete columns (if any)
discrete_columns = ["column_a", "column_b"]  # Modify as needed

# Initialize the synthesizer
synthesizer = KAN_CTGAN(
    epochs=100, 
    verbose=True, 
    grid_size_gen=5,  
    spline_order_gen=3,
    grid_size_desc=5,
    spline_order_desc=3
)

# Train the model on your data
synthesizer.fit(df, discrete_columns)

# Sample 1000 synthetic rows
synthetic_data = synthesizer.sample(1000)

# Save or explore the results
synthetic_data.to_csv("synthetic_output.csv", index=False)

You can replace KAN_CTGAN with any of the other available models, such as:

  • HYBRID_KAN_CTGAN
  • Disc_KAN_CTGAN
  • Gen_KAN_CTGAN
  • KAN_TVAE
  • HYBRID_KAN_TVAE

Each supports similar training and sampling APIs compatible with the original CTGAN and TVAE interfaces.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kan_synth-0.1.0.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kan_synth-0.1.0-py3-none-any.whl (66.2 kB view details)

Uploaded Python 3

File details

Details for the file kan_synth-0.1.0.tar.gz.

File metadata

  • Download URL: kan_synth-0.1.0.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kan_synth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a93d89fc3a9377e680384677a2eaa9c55f01586d6a42feea48f6f4e680e5857c
MD5 69552afa54e979c2fc6891315f55c7a5
BLAKE2b-256 959569920de355d1c2e8578aaf42d7a32caa93532dcfb51e0ab1787049d58ee6

See more details on using hashes here.

File details

Details for the file kan_synth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kan_synth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kan_synth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4bb4e26dc2ec935f17a34ca7d0fa3ea891765061943bbf10bd792a4ba7d45a5
MD5 558711c63fa956f17d0b82b519aa1c2f
BLAKE2b-256 797689b181851f5e4374175c2b9a186d17823390dba0bbf80a78044f2b2256d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page