Skip to main content

Data wrangling framework with LLM using code generation

Project description

Following work by Narayan et al., we use the same set of benchmark datasets. You can clone the repo and download the data by using:

git clone git@github.com:effyli/efficient_llm_data_wrangling.git
mkdir data/
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P data
tar xvf data/datasets.tar.gz -C data/

To run the script, first setup the data_dir environmental variable by using:

export DATASET_PATH="$PWD/data/datasets"

To start running jobs (e.g. for bing-query-logs-unit without data router and generate python code), try:

python src/run_wrangler.py --data_dir %your_data_directory/data_transformation/benchmark-bing-query-logs-unit%  --num_trials 3  --seed 42 --k 3 --d 0 --num_iter 5 --llm llama3.2 --lang python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableswift-0.1.1.tar.gz (85.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tableswift-0.1.1-py3-none-any.whl (93.5 kB view details)

Uploaded Python 3

File details

Details for the file tableswift-0.1.1.tar.gz.

File metadata

  • Download URL: tableswift-0.1.1.tar.gz
  • Upload date:
  • Size: 85.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd111b5c46c5183b595392fc6e49fb601595638194d7b6a17c3faa27c1a27844
MD5 7cc6f557c9d677f4637ae6a13c022a60
BLAKE2b-256 bd39ba3e8065b4b9318f69fa8621cb71ca6a1cc593f59ab4340410aee46f723f

See more details on using hashes here.

File details

Details for the file tableswift-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tableswift-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 93.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f0805f8ec83baa3f9fbd0702a2b3ed6d4e583791bb7d56567853a7452744bf3f
MD5 0e6f8e613b0c304c5470e665dd6dad97
BLAKE2b-256 095da3e1ebd0cd2707dd1be7caf04944fec9576a6cb3ba746cbc0e12c29412c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page