Data wrangling framework with LLM using code generation
Project description
TableSwift is a python package that can do different types of data wrangling, using LLMs with code generation.
This package currently supports three functions. To start using the package, first configurre with your API key.
ts.configure(api_key="your API key here.")
Or alternatively define yoru API key in the system environment using variable name TABLESWIFT_API_KEY.
To generate labels, use:
labeled_data = ts.generate_labels(instruction="label the input samples",
task="data_transformation",
column_name="name",
demonstrations=[{"Input": "sample1", "Output": "label1"},
{"Input": "sample2", "Output": "label2"}],
samples_to_label=[{"Input": "sample1", "Output": ""},
{"Input": "sample2", "Output": ""},
{"Input": "sample3", "Output": ""}])
To generate code, use:
code, router_code = ts.generate_code(instruction="Transform input into output",
task="data_transformation",
samples=[{"Input": "sample1", "Output": "label1"},
{"Input": "sample2", "Output": "label2"}],
lang="python")
There are also hyperparameters that can be overriden, to do so, use:
code, router_code = ts.generate_code(instruction="Transform input into output",
task="data_transformation",
samples=[{"Input": "sample1", "Output": "label1"},
{"Input": "sample2", "Output": "label2"}],
lang="python",
num_trials=1,
num_retry=3,
num_iterations=1)
A list of hyperparameters with their defaul value is:
DEFAULT_PARAMS = {
"use_data_router": True,
"num_trials": 2,
"num_retry": 3,
"seed": 42,
"num_iterations": 2,
"max_num_solutions": 3,
"limit_fallback": 20, # number of invalid data samples before fallback, should be a percentage in the future
"llm": "gpt-4o-mini"
}
Current package supports two languages: python and duckdb SQL. To generate python code use lang=python, and to generate duckdb SQL query, use lang=sql.
Current pakcage supports the following tasks, remember to match the task parameter with the following string.
"data_transformation"
"entity_matching"
"error_detection_spelling"
"value_imputation"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tableswift-0.1.4.tar.gz.
File metadata
- Download URL: tableswift-0.1.4.tar.gz
- Upload date:
- Size: 86.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8334a6b8315ea9548e008b64763d48e6236209a2892a4e9ea21d0c1b23710fe
|
|
| MD5 |
39d56cc75f304963f4bb5337177da10a
|
|
| BLAKE2b-256 |
7792ed865e6922df8e567643def3244f2bb3b00bef42d3f4700f8acc28cb3642
|
File details
Details for the file tableswift-0.1.4-py3-none-any.whl.
File metadata
- Download URL: tableswift-0.1.4-py3-none-any.whl
- Upload date:
- Size: 94.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ba80c898525c81f20657927039714104f12edabe08a11cc5a08c7a668ca2a07
|
|
| MD5 |
e4831791b8507b88de4de43721e15824
|
|
| BLAKE2b-256 |
f54f2ee24d497a0437a0e966eade817a9ae01ae11b871a6de84118fee0be6c91
|