Skip to main content

Generating Realistic Tabular Data using Large Language Models

Project description

Generation of Realistic Tabular data
with pretrained Transformer-based language models

     

Our GReaT framework utilizes the capabilities of pretrained large language Transformer models to synthesize realistic tabular data. New samples are generated with just a few lines of code, following an easy-to-use API.

GReaT Installation

The GReaT framework can be easily installed using with pip:

pip install be-great

GReaT Quickstart

In the example below, we show how the GReaT approach is used to generate synthetic tabular data for the California Housing dataset.

from be_great import GReaT
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(as_frame=True).frame

model = GReaT(llm='distilgpt2', epochs=50)
model.fit(data)
synthetic_data = model.sample(n_samples=100)

GReaT Citation

If you use GReaT, please link or cite our work:

@article{
}

GReaT Acknowledgements

We sincerely thank the HuggingFace :hugs: framework.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

be_great-0.0.2.tar.gz (11.9 kB view hashes)

Uploaded Source

Built Distribution

be_great-0.0.2-py3-none-any.whl (12.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page