Skip to main content

No project description provided

Project description

A minimalistic LLM agent for Exploratory Data Analysis (EDA) using pandas library

Task: given a CSV or a XLSX file, respond to user's query about this table by generating a python code and executing it.

Query example: 'I want to know the average gdp of countries that have happiness index greater than 5.5. I also need these countries.'

alt text

Flow parts that are used by agent:

  • Tagging the query - using a LM to classify the query to "plot" or "general". Two methods are implemented, using OpenAI functions tagging of a Pydantic object and using a classifier LM (DeBERTa) to classify the query. This is needed to instruct the LLM to save the plot to a directory, so that it could be later sent as a Response via FastAPI to the user. Or if it is just a text answer type of question, the LLM would be instructed not to do any plotting.

  • Generating the plan - using a LM to generate a step-by-step plan for the coder. Inspired by Solve and Plan (Wang et al., 2023). Helps smaller LLMs significantly.

  • Generating the code - using a LM to generate a python code to do the data analysis. The code is then executed and the result is returned to the user.

4 different flow strategies are implemented:

  • "simple" - Tagging the query, generating the plan, generating the code, executing the code, returning the result to the user.
  • "simple_functions" - Same, but asking the CoderLLM to generate a python function instead of a main script. This way, the output could be better tested, as the agent mostly returns a single value (string, float, DataFrame, etc.).

alt text

  • "coder_only_simple" - No planning step, just generating the script and executing it.
  • "coder_only_functions" - Same, but asking the CoderLLM to generate a python function instead of a main script.

alt text

Agent arguments:

  • max_debug_timees - maximum number of debug times for the agent to run. If the agent runs out of debug times, it will return an error message to the user.
  • head_number - number of first rows of a DataFrame to show to the LLM.
  • prompt_strategy - flow strategy to use (described above).
  • coder_model - model to use for the CoderLLM. Supported models are "codellama/CodeLlama-7b-Instruct-hf", "WizardLM/WizardCoder-1(3, 15)B-V1.0" and gpt models.
  • gpt_model - model used for planning
  • add_column_description - if True, the agent will add a json-formatted description of the columns to the prompt for the LLM. This ensures that the LLM knows the precise column names and accompanying values.
  • tagging_strategy - strategy to use for tagging the query. Supported strategies are "openai" and "deberta".

answer_query arguments:

  • show_plot - if True, the agent will show the plot to the user interactively. If False, the agent will save the plot to a directory.
  • save_plot_path - path to save the plot to. If None, the plotname will be generated.

answer_query method returns a text answer and a dictionary with details and outputs from every step of the flow.

How to run the agent:

See the main.py file for an example of how to run the agent.

Datasets and evaluation

Public datasets are in the "datasets" folder Evaluation of the agent is in the "evaluation" folder.

  • evaluation/collect_answers.pyruns a given agent on a given dataset and collects the answers to a xlsx file. Configs are stored in conf/ folder.
  • evaluation/evaluation.py evaluates the answers in one xlsx to reference answers in another xlsx file either by string equality or asking a gpt model to compare the answers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableqallmagent-0.1.14.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

tableqallmagent-0.1.14-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file tableqallmagent-0.1.14.tar.gz.

File metadata

  • Download URL: tableqallmagent-0.1.14.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for tableqallmagent-0.1.14.tar.gz
Algorithm Hash digest
SHA256 ae8aab3ad11bb6ab4eb3fe6135380b00ade23c1722063436cf54693376fe9754
MD5 b94edf02f297c5a90d7b711db9001eb3
BLAKE2b-256 29964bedde255e96305fb5f3baf159a3d5405a830fb6bf04dca5a994b2f9ed32

See more details on using hashes here.

File details

Details for the file tableqallmagent-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: tableqallmagent-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for tableqallmagent-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 ed92fa45dfc11a5744f99704416e1f167dbedc3e9e525ce8bbeb9ea446ae6e04
MD5 7d87691820be30c4638724b9f11c1138
BLAKE2b-256 926bade3a020d872f2935c3a1aee4dad8bef2531f9defe53e46294de90586a69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page