Skip to main content

Data wrangling framework with LLM using code generation

Project description

TableSwift is a python package that can do different types of data wrangling, using LLMs with code generation.

This package currently supports three functions. To start using the package, first configurre with your API key.

ts.configure(api_key="your API key here.")

Or alternatively define yoru API key in the system environment using variable name TABLESWIFT_API_KEY.

To generate labels, use:

labeled_data = ts.generate_labels(instruction="label the input samples", 
                                      task="data_transformation",
                                      column_name="name",
                                      demonstrations=[{"Input": "sample1", "Output": "label1"},
                                                     {"Input": "sample2", "Output": "label2"}],
                                      samples_to_label=[{"Input": "sample1", "Output": ""},
                                                        {"Input": "sample2", "Output": ""},
                                                        {"Input": "sample3", "Output": ""}])

To generate code, use:

code, router_code = ts.generate_code(instruction="Transform input into output",
                     task="data_transformation",
                     samples=[{"Input": "sample1", "Output": "label1"},
                              {"Input": "sample2", "Output": "label2"}],
                     lang="python")

There are also hyperparameters that can be overriden, to do so, use:

code, router_code = ts.generate_code(instruction="Transform input into output",
                     task="data_transformation",
                     samples=[{"Input": "sample1", "Output": "label1"},
                              {"Input": "sample2", "Output": "label2"}],
                     lang="python",
                     num_trials=1,
                     num_retry=3,
                     num_iterations=1)

A list of hyperparameters with their defaul value is:

DEFAULT_PARAMS = {
    "use_data_router": True,
    "num_trials": 2,
    "num_retry": 3,
    "seed": 42,
    "num_iterations": 2,
    "max_num_solutions": 3,
    "limit_fallback": 20, # number of invalid data samples before fallback, should be a percentage in the future
    "llm": "gpt-4o-mini" 
}

Current package supports two languages: python and duckdb SQL. To generate python code use lang=python, and to generate duckdb SQL query, use lang=sql. Current pakcage supports the following tasks, remember to match the task parameter with the following string.

"data_transformation"
"entity_matching"
"error_detection_spelling"
"value_imputation"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableswift-0.1.4.tar.gz (86.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tableswift-0.1.4-py3-none-any.whl (94.1 kB view details)

Uploaded Python 3

File details

Details for the file tableswift-0.1.4.tar.gz.

File metadata

  • Download URL: tableswift-0.1.4.tar.gz
  • Upload date:
  • Size: 86.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e8334a6b8315ea9548e008b64763d48e6236209a2892a4e9ea21d0c1b23710fe
MD5 39d56cc75f304963f4bb5337177da10a
BLAKE2b-256 7792ed865e6922df8e567643def3244f2bb3b00bef42d3f4700f8acc28cb3642

See more details on using hashes here.

File details

Details for the file tableswift-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: tableswift-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 94.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for tableswift-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2ba80c898525c81f20657927039714104f12edabe08a11cc5a08c7a668ca2a07
MD5 e4831791b8507b88de4de43721e15824
BLAKE2b-256 f54f2ee24d497a0437a0e966eade817a9ae01ae11b871a6de84118fee0be6c91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page