Synthetic data generation using KAN-enhanced CTGAN/TVAE architectures
Project description
Kolmogorov–Arnold Networks for Tabular Data Synthesis
KAN_synth is an open-source Python package for generating high-fidelity synthetic tabular data using Kolmogorov–Arnold Networks (KANs). It extends the original CTGAN and TVAE models from the Synthetic Data Vault (SDV) by replacing their MLP-based architectures with KAN-based components.
This project was developed for research (undergraduate thesis) and practical evaluation of KAN-based generative models, particularly in comparison to traditional GANs and VAEs on tabular data synthesis tasks.
Overview
The KAN_synth repository is structured into several components:
Models (models/)
KAN-based generative models for tabular data:
- KAN_CTGAN: Fully KAN-based implementation of CTGAN.
- HYBRID_KAN_CTGAN: CTGAN with only the first layer replaced by KAN.
- Disc_KAN_CTGAN: CTGAN with a KAN-based discriminator only.
- Gen_KAN_CTGAN: CTGAN with a KAN-based generator only.
- KAN_TVAE: Full KAN-based variant of TVAE.
- HYBRID_KAN_TVAE: TVAE where one intermediate layer is replaced by a KAN block.
Evaluation Scripts (benchmarks/)
Non-reusable scripts developed to:
- Train and evaluate each model on various datasets.
- Measure similarity and ML utility.
- Aggregate and visualize results.
Data Generation Scripts (data_gen/)
Standalone scripts for generating synthetic datasets using each model for internal benchmarking and experimentation purposes.
Utilities (utilities/)
Functions to compute:
- Overall similarity between real and synthetic datasets
- Model evaluation metrics (MAE, RMSE, R² for regression; Accuracy, F1, etc. for classification)
- Visualizations of ML utility scores across models
Tests (test/)
Unit tests for model importability, synthetic generation routines, and ML pipeline sanity checks using pytest.
The full model descriptions can be found in the thesis work at: [Not yet published].
Installation
Once the package is published on PyPI, it can be installed via pip:
pip install KAN-synth
All required packages are listed in requirements.txt, and will be automatically installed when using:
pip install -r requirements.txt
Since these models are based on the original CTGAN and TVAE architectures, the same data preprocessing principles apply:
- Continuous columns must be represented as floats.
- Discrete columns must be represented as integers or strings.
- The input dataset should not contain any missing values.
For additional details, refer to the original CTGAN repository: https://github.com/sdv-dev/CTGAN
Local Development Installation
if you'd like to install the package locally for development or experimentation:
git clone https://github.com/cris1618/KAN_synth.git
cd KAN_synth
pip install -e .
The -e flag installs the package in "editable" mode, meaning any changes you make to the code will immediately affect the installed version without needing to reinstall.
Usage Example
Here's a minimal working example to train a KAN-based synthesizer and generate synthetic tabular data using the KAN_CTGAN model.
from KAN_synth import KAN_CTGAN
import pandas as pd
# Load real tabular data (pandas DataFrame)
df = pd.read_csv("your_dataset.csv")
# Define the discrete columns (if any)
discrete_columns = ["column_a", "column_b"] # Modify as needed
# Initialize the synthesizer
synthesizer = KAN_CTGAN(
epochs=100,
verbose=True,
grid_size_gen=5,
spline_order_gen=3,
grid_size_desc=5,
spline_order_desc=3
)
# Train the model on your data
synthesizer.fit(df, discrete_columns)
# Sample 1000 synthetic rows
synthetic_data = synthesizer.sample(1000)
# Save or explore the results
synthetic_data.to_csv("synthetic_output.csv", index=False)
You can replace KAN_CTGAN with any of the other available models, such as:
HYBRID_KAN_CTGANDisc_KAN_CTGANGen_KAN_CTGANKAN_TVAEHYBRID_KAN_TVAE
Each supports similar training and sampling APIs compatible with the original CTGAN and TVAE interfaces.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kan_synth-0.1.2.tar.gz.
File metadata
- Download URL: kan_synth-0.1.2.tar.gz
- Upload date:
- Size: 42.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b063913b714208e7f337c9e05865fd80b4270eff4ae75e08c254e8cf259bea1d
|
|
| MD5 |
d31f71e10275d1653f0b12999eaaf42e
|
|
| BLAKE2b-256 |
d13f995cb444dca2c129727b4c19adfc989afceec9f7d4723926ae444c2b5cff
|
File details
Details for the file kan_synth-0.1.2-py3-none-any.whl.
File metadata
- Download URL: kan_synth-0.1.2-py3-none-any.whl
- Upload date:
- Size: 66.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d832d25866e9054485f0891034a77c5c551ab414ce39638ec3c14345150ec7cf
|
|
| MD5 |
29c5bbd9e837fbbde93671e136203286
|
|
| BLAKE2b-256 |
389f51cdf921864c226b60fdd55adf2cd41efb7e081e83ab5d22b9b14d2a2e41
|