Skip to main content

Synthetic data generation using KAN-enhanced CTGAN/TVAE architectures

Project description

pypi downloads license python

Kolmogorov–Arnold Networks for Tabular Data Synthesis

KAN_synth is an open-source Python package for generating high-fidelity synthetic tabular data using Kolmogorov–Arnold Networks (KANs). It extends the original CTGAN and TVAE models from the Synthetic Data Vault (SDV) by replacing their MLP-based architectures with KAN-based components.

This project was developed for research (undergraduate thesis) and practical evaluation of KAN-based generative models, particularly in comparison to traditional GANs and VAEs on tabular data synthesis tasks.

Overview

The KAN_synth repository is structured into several components:

Models (models/)

KAN-based generative models for tabular data:

  • KAN_CTGAN: Fully KAN-based implementation of CTGAN.
  • HYBRID_KAN_CTGAN: CTGAN with only the first layer replaced by KAN.
  • Disc_KAN_CTGAN: CTGAN with a KAN-based discriminator only.
  • Gen_KAN_CTGAN: CTGAN with a KAN-based generator only.
  • KAN_TVAE: Full KAN-based variant of TVAE.
  • HYBRID_KAN_TVAE: TVAE where one intermediate layer is replaced by a KAN block.

Evaluation Scripts (benchmarks/)

Non-reusable scripts developed to:

  • Train and evaluate each model on various datasets.
  • Measure similarity and ML utility.
  • Aggregate and visualize results.

Data Generation Scripts (data_gen/)

Standalone scripts for generating synthetic datasets using each model for internal benchmarking and experimentation purposes.

Utilities (utilities/)

Functions to compute:

  • Overall similarity between real and synthetic datasets
  • Model evaluation metrics (MAE, RMSE, R² for regression; Accuracy, F1, etc. for classification)
  • Visualizations of ML utility scores across models

Tests (test/)

Unit tests for model importability, synthetic generation routines, and ML pipeline sanity checks using pytest.

The full model descriptions can be found in the thesis work at: [Not yet published].

Installation

Once the package is published on PyPI, it can be installed via pip:

pip install KAN-synth

All required packages are listed in requirements.txt, and will be automatically installed when using:

pip install -r requirements.txt

Since these models are based on the original CTGAN and TVAE architectures, the same data preprocessing principles apply:

  • Continuous columns must be represented as floats.
  • Discrete columns must be represented as integers or strings.
  • The input dataset should not contain any missing values.

For additional details, refer to the original CTGAN repository: https://github.com/sdv-dev/CTGAN

Local Development Installation

if you'd like to install the package locally for development or experimentation:

git clone https://github.com/cris1618/KAN_synth.git
cd KAN_synth
pip install -e .

The -e flag installs the package in "editable" mode, meaning any changes you make to the code will immediately affect the installed version without needing to reinstall.

Usage Example

Here's a minimal working example to train a KAN-based synthesizer and generate synthetic tabular data using the KAN_CTGAN model.

from KAN_synth import KAN_CTGAN
import pandas as pd

# Load real tabular data (pandas DataFrame)
df = pd.read_csv("your_dataset.csv")

# Define the discrete columns (if any)
discrete_columns = ["column_a", "column_b"]  # Modify as needed

# Initialize the synthesizer
synthesizer = KAN_CTGAN(
    epochs=100, 
    verbose=True, 
    grid_size_gen=5,  
    spline_order_gen=3,
    grid_size_desc=5,
    spline_order_desc=3
)

# Train the model on your data
synthesizer.fit(df, discrete_columns)

# Sample 1000 synthetic rows
synthetic_data = synthesizer.sample(1000)

# Save or explore the results
synthetic_data.to_csv("synthetic_output.csv", index=False)

You can replace KAN_CTGAN with any of the other available models, such as:

  • HYBRID_KAN_CTGAN
  • Disc_KAN_CTGAN
  • Gen_KAN_CTGAN
  • KAN_TVAE
  • HYBRID_KAN_TVAE

Each supports similar training and sampling APIs compatible with the original CTGAN and TVAE interfaces.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kan_synth-0.1.2.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kan_synth-0.1.2-py3-none-any.whl (66.4 kB view details)

Uploaded Python 3

File details

Details for the file kan_synth-0.1.2.tar.gz.

File metadata

  • Download URL: kan_synth-0.1.2.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kan_synth-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b063913b714208e7f337c9e05865fd80b4270eff4ae75e08c254e8cf259bea1d
MD5 d31f71e10275d1653f0b12999eaaf42e
BLAKE2b-256 d13f995cb444dca2c129727b4c19adfc989afceec9f7d4723926ae444c2b5cff

See more details on using hashes here.

File details

Details for the file kan_synth-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: kan_synth-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 66.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kan_synth-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d832d25866e9054485f0891034a77c5c551ab414ce39638ec3c14345150ec7cf
MD5 29c5bbd9e837fbbde93671e136203286
BLAKE2b-256 389f51cdf921864c226b60fdd55adf2cd41efb7e081e83ab5d22b9b14d2a2e41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page