Skip to main content

Data wrangling framework with LLM using code generation

Project description

Following work by Narayan et al., we use the same set of benchmark datasets. You can clone the repo and download the data by using:

git clone git@github.com:effyli/efficient_llm_data_wrangling.git
mkdir data/
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P data
tar xvf data/datasets.tar.gz -C data/

To run the script, first setup the data_dir environmental variable by using:

export DATASET_PATH="$PWD/data/datasets"

To start running jobs (e.g. for bing-query-logs-unit without data router and generate python code), try:

python src/run_wrangler.py --data_dir %your_data_directory/data_transformation/benchmark-bing-query-logs-unit%  --num_trials 3  --seed 42 --k 3 --d 0 --num_iter 5 --llm llama3.2 --lang python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableswift-0.1.2.tar.gz (85.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tableswift-0.1.2-py3-none-any.whl (93.6 kB view details)

Uploaded Python 3

File details

Details for the file tableswift-0.1.2.tar.gz.

File metadata

  • Download URL: tableswift-0.1.2.tar.gz
  • Upload date:
  • Size: 85.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9e5010703d38695d42be1cb241cd3d72909e80947ddeafac207c52efe8ea4ef1
MD5 694b3b469213a022fc5f80a79e1f5357
BLAKE2b-256 33c635af0c619e77988209a22cdf5db29e63002a5798f92c56776aadeac01dbd

See more details on using hashes here.

File details

Details for the file tableswift-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tableswift-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 93.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c04487f3ed6eb5f63175520c6d597e05a6367b1b91cb72c8518b974201414aa2
MD5 21b2533550d97d0ae6554e99ce726380
BLAKE2b-256 cca6ec83d69fb02c50625981813afb88ead6e8058722f12058908781f9bf57ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page