Skip to main content

Data wrangling framework with LLM using code generation

Project description

Following work by Narayan et al., we use the same set of benchmark datasets. You can clone the repo and download the data by using:

git clone git@github.com:effyli/efficient_llm_data_wrangling.git
mkdir data/
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P data
tar xvf data/datasets.tar.gz -C data/

To run the script, first setup the data_dir environmental variable by using:

export DATASET_PATH="$PWD/data/datasets"

To start running jobs (e.g. for bing-query-logs-unit without data router and generate python code), try:

python src/run_wrangler.py --data_dir %your_data_directory/data_transformation/benchmark-bing-query-logs-unit%  --num_trials 3  --seed 42 --k 3 --d 0 --num_iter 5 --llm llama3.2 --lang python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableswift-0.1.0.tar.gz (82.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tableswift-0.1.0-py3-none-any.whl (88.9 kB view details)

Uploaded Python 3

File details

Details for the file tableswift-0.1.0.tar.gz.

File metadata

  • Download URL: tableswift-0.1.0.tar.gz
  • Upload date:
  • Size: 82.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ffd48c53b11f9f771e381c109c963f59c14a9a5a72010bfef715e5e2839e355b
MD5 5a4ba07ec799c5e073304abd29aa3ef5
BLAKE2b-256 b7fe8673c79282e0b6f643eddc722c90ae1440521caad7982f4a99f09f39d378

See more details on using hashes here.

File details

Details for the file tableswift-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tableswift-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8185360e67c4717256c7658b4d899dc50935fa62f2cf2a3fbd4aa34a5bc74f7d
MD5 d98f98667aca643f1a772d786ac0c548
BLAKE2b-256 3dc85d18006a87688949a698dbb3d37ba8c03934d94c6a7900c8741b403d9a52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page