Skip to main content

Data wrangling framework with LLM using code generation

Project description

Following work by Narayan et al., we use the same set of benchmark datasets. You can clone the repo and download the data by using:

git clone git@github.com:effyli/efficient_llm_data_wrangling.git
mkdir data/
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P data
tar xvf data/datasets.tar.gz -C data/

To run the script, first setup the data_dir environmental variable by using:

export DATASET_PATH="$PWD/data/datasets"

To start running jobs (e.g. for bing-query-logs-unit without data router and generate python code), try:

python src/run_wrangler.py --data_dir %your_data_directory/data_transformation/benchmark-bing-query-logs-unit%  --num_trials 3  --seed 42 --k 3 --d 0 --num_iter 5 --llm llama3.2 --lang python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableswift-0.1.3.tar.gz (85.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tableswift-0.1.3-py3-none-any.whl (93.6 kB view details)

Uploaded Python 3

File details

Details for the file tableswift-0.1.3.tar.gz.

File metadata

  • Download URL: tableswift-0.1.3.tar.gz
  • Upload date:
  • Size: 85.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.3.tar.gz
Algorithm Hash digest
SHA256 53a6a660533aeedba374968155f6ee379248a18dd3554edac5e4a93b513d2c3b
MD5 8ac109ca794342f06ee0d07e055bf39d
BLAKE2b-256 6b09b32af0ff08d72be944b0a069fdf968db1d42916c6efec86cd40e21b7d648

See more details on using hashes here.

File details

Details for the file tableswift-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: tableswift-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 93.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6f5f3be24c4df265d441217c88eca5618531b0f999a4812ed5b48a1c1d44f47d
MD5 b52072cc9fcf716a93afbb3f8560a820
BLAKE2b-256 811aaf081bfbae146b1687477930caded916902633d4d3876a53a20751d0569a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page