Generating Realistic Tabular Data using Large Language Models
Project description
Generation of Realistic Tabular data
with pretrained Transformer-based language models
Our GReaT framework utilizes the capabilities of pretrained large language Transformer models to synthesize realistic tabular data. New samples are generated with just a few lines of code, following an easy-to-use API.
GReaT Installation
The GReaT framework can be easily installed using with pip:
pip install be-great
GReaT Quickstart
In the example below, we show how the GReaT approach is used to generate synthetic tabular data for the California Housing dataset.
from be_great import GReaT
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True).frame
model = GReaT(llm='distilgpt2', epochs=50)
model.fit(data)
synthetic_data = model.sample(n_samples=100)
GReaT Citation
If you use GReaT, please link or cite our work:
@article{
}
GReaT Acknowledgements
We sincerely thank the HuggingFace :hugs: framework.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.