Skip to main content

LLM insights to data.

Project description

llads

'Large Language Data and Statistics'. A library to generate LLM insights to data.

Installation

Install the package from PyPI, as well as the libraries in requirements.txt.

Usage

LLM

You can use any LLM that works with the OpenAI API syntax, including a local LlamaCPP server. Note that the LLM needs to be powerful enough to properly parse and produce the expected outputs for the various steps of the chain. The following information is necessary for creating the LLM:

  • api key (if using a cloud LLM provider)
  • base url
  • model name

Example

import pandas as pd

from llads.customLLM import customLLM
from llads.tools import get_world_bank_gdp_data # this is a custom tool included as an example. You can define and pass your own tools
from llads.visualizations import gen_plot # this is a line/bar plot visualization tool included as an example. You can define and pass your own visualization tools

system_prompts = pd.read_csv("https://raw.githubusercontent.com/dhopp1/llads/refs/heads/main/system_prompts.csv") # a good default is included in the repo, but you can edit to your own needs

# creating the LLM (gemini 2.0 flash as an example)
llm = customLLM(
        api_key="API_KEY",
        base_url="https://generativelanguage.googleapis.com/v1beta/openai",
        model_name="gemini-2.0-flash",
        temperature=0.0,
        max_tokens=2048,
        system_prompts=system_prompts,
)

# defining which tools the LLM has available to it
tools = [get_world_bank_gdp_data]
plot_tools = [gen_plot]

# generating a response
prompt = "What is the GDP of Italy and the UK as a % of Germany over the last 5 years?" # the user's initial question
results = llm.chat(
	prompt=prompt, 
	tools=tools, 
	plot_tools=plot_tools, 
	validate=True, # if True, the LLM will perform an additional validation step on its commentary
	use_free_plot=False, # if False, the LLM will have to use one of the plot_tools, if True, it will be free to make its own matplotlib plot
	prior_query_id=None, # None, because this is the first query in the chat history
)

# follow-up question
new_query = "Add France to the analysis"
followup_result = llm.chat(
	prompt=new_query, 
	tools=tools, 
	plot_tools=plot_tools, 
	validate=True,
	use_free_plot=False,
	prior_query_id=results["tool_result"]["query_id"], # pass our prior query id to make message history available
)

# you can access prior query results via the query id
llm._query_results[query_id]

Interpreting output

The chat() function will produce a dictionary with the following values:

  • initial_prompt: The question passed by the user
  • tool_result: A dictionary with the following information:
    • query_id: The unique ID number of this query
    • tool_call: The name and arguments of the tools the LLM called
    • invoked_result: The actual DataFrame resulting from the tool calls
  • pd_code: A dictionary with the following information:
    • data_desc: A text description of the data made available to the LLM via the tool call
    • pd_code: The Python code the LLM executed to edit the raw data available to it
  • dataset: The actual DataFrame that is the result of the pd_code call
  • explanation: The LLM's explanation of the data manipulation process undergone to answer the user's question
  • commentary: The LLM's commentary on the final dataset answering the user's question
  • plots: A dictionary with the following values:
    • visualization_call: A list of either the matplotlib code written or the plotting function call run to create the visualization to answer the user's question
    • invoked_result: A list of the actual plot figures produced to answer the user's question
  • context_rich_prompt: The prompt passed to the LLM containing the prior context. Empty string if it's the first question in the chat.

Explanation of steps/chain

  1. The LLM determines which raw data functions it wants to call with which arguments via the llm.gen_tool_call() function, the calls and generates the raw datasets.
  2. Given the raw data available from the previous step, the llm.gen_pandas_df() produces Python code to create a final result dataset.
  3. The LLM explains the data transformation steps via the llm.explain_pandas_df() function.
  4. The LLM is given the final full result dataset and writes commentary answering the user's question via the llm.gen_final_commentary() funcrtion.
  5. If validate=True in the llm.gen_final_commentary() call, the LLM performs a validation step on its commentary to look for and correct errors.
  6. The LLM produces a visualization to help answer the user's question, via either the llm.gen_free_plot() function (if use_free_plot=True) or the llm.gen_plot_call() function. The former allows the LLM to create any Matplotlib plot, the latter restricts it to calling one of the predefined visualization tools. Useful if you want to customize style, etc.

If complete_responses is not None, i.e., a message history is passed, at the very beginning of the pipeline the user's prompt will be augmented with the full context history of previous messages, includding tool calls, data manipulation steps, commentary provided, and visualizations created.

Defining your own datasets/tools

The library contains the get_world_bank_gdp_data function as an example. To make additional data available to the LLM, you can define your own tools. For example, say we wanted to add a simple addition tool:

from langchain_core.tools import tool

@tool
def add(first_int: int, second_int: int) -> int:
    "Add two integers."
    return first_int + second_int
    
tools = [add, get_world_bank_gdp_data] # the LLM will now be able to choose either the addition tool, or the World Bank GDP tool.

As long as the input and outputs of the function are well defined, the LLM should be able to use it if helpful to answer a user's question.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llads-0.0.2.tar.gz (11.9 kB view details)

Uploaded Source

File details

Details for the file llads-0.0.2.tar.gz.

File metadata

  • Download URL: llads-0.0.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for llads-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1d756993a8a9605344c36187054610b8a40d5f382486df8a0e407dc793140232
MD5 2507ec09e30bc343a8a65e83169cdd92
BLAKE2b-256 b93dd483a90599cfccdb463da785a512dcbe05116bdcae86319ec5898b62a986

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page