Skip to main content

Generating Realistic Tabular Data using Large Language Models

Project description

Generation of Realistic Tabular data
with pretrained Transformer-based language models

     

Our GReaT framework utilizes the capabilities of pretrained large language Transformer models to synthesize realistic tabular data. New samples are generated with just a few lines of code, following an easy-to-use API. Please see our publication for more details.

GReaT Installation

The GReaT framework can be easily installed using with pip - requires a Python version >= 3.9:

pip install be-great

GReaT Quickstart

In the example below, we show how the GReaT approach is used to generate synthetic tabular data for the California Housing dataset.

from be_great import GReaT
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(as_frame=True).frame

model = GReaT(llm='distilgpt2', batch_size=32, epochs=50)
model.fit(data)
synthetic_data = model.sample(n_samples=100)

GReaT Citation

If you use GReaT, please link or cite our work:

@inproceedings{borisov2023language,
  title={Language Models are Realistic Tabular Data Generators},
  author={Vadim Borisov and Kathrin Sessler and Tobias Leemann and Martin Pawelczyk and Gjergji Kasneci},
  booktitle={The Eleventh International Conference on Learning Representations },
  year={2023},
  url={https://openreview.net/forum?id=cEygmQNOeI}
}

GReaT Acknowledgements

We sincerely thank the HuggingFace :hugs: framework.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

be_great-0.0.4.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

be_great-0.0.4-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file be_great-0.0.4.tar.gz.

File metadata

  • Download URL: be_great-0.0.4.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for be_great-0.0.4.tar.gz
Algorithm Hash digest
SHA256 7b374748d1fb8c2f44af1bf80968ad62b9005d8ff8103a799537c18ea51e47f7
MD5 1ace324fe0c8a0fa121a624862246bce
BLAKE2b-256 30188572cfb7b320663995b59dcc2464b522da294f7637258d79f5125ab04b09

See more details on using hashes here.

File details

Details for the file be_great-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: be_great-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for be_great-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d70c1ae47b9df7974a0b57829d8a0c1cd1b2d13e62b9015879f1f1e4ac488812
MD5 5ac7f4200304d5e9f2a1b4e581dbf499
BLAKE2b-256 c9bbb2632f706e21dc6914c9ffbbd365343f0107178f30e605df1c6ccedfe489

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page