LLM insights to data.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

llads

'Large Language Data and Statistics'. A library to generate LLM insights to data.

Installation

Install the package from PyPI, as well as the libraries in requirements.txt.

Usage

LLM

You can use any LLM that works with the OpenAI API syntax, including a local LlamaCPP server. Note that the LLM needs to be powerful enough to properly parse and produce the expected outputs for the various steps of the chain. The following information is necessary for creating the LLM:

api key (if using a cloud LLM provider)
base url
model name

Example

import pandas as pd

from llads.customLLM import customLLM
from llads.tools import get_world_bank_gdp_data # this is a custom tool included as an example. You can define and pass your own tools
from llads.visualizations import gen_plot # this is a line/bar plot visualization tool included as an example. You can define and pass your own visualization tools

system_prompts = pd.read_csv("https://raw.githubusercontent.com/dhopp1/llads/refs/heads/main/system_prompts.csv") # a good default is included in the repo, but you can edit to your own needs

# creating the LLM (gemini 2.0 flash as an example)
llm = customLLM(
        api_key="API_KEY",
        base_url="https://generativelanguage.googleapis.com/v1beta/openai",
        model_name="gemini-2.0-flash",
        temperature=0.0,
        max_tokens=2048,
        system_prompts=system_prompts,
)

# defining which tools the LLM has available to it
tools = [get_world_bank_gdp_data]
plot_tools = [gen_plot]

# generating a response
prompt = "What is the GDP of Italy and the UK as a % of Germany over the last 5 years?" # the user's initial question
results = llm.chat(
	prompt=prompt, 
	tools=tools, 
	plot_tools=plot_tools, 
	validate=True, # if True, the LLM will perform an additional validation step on its commentary
	use_free_plot=False, # if False, the LLM will have to use one of the plot_tools, if True, it will be free to make its own matplotlib plot
	prior_query_id=None, # None, because this is the first query in the chat history
)

# follow-up question
new_query = "Add France to the analysis"
followup_result = llm.chat(
	prompt=new_query, 
	tools=tools, 
	plot_tools=plot_tools, 
	validate=True,
	use_free_plot=False,
	prior_query_id=results["tool_result"]["query_id"], # pass our prior query id to make message history available
)

# you can access prior query results via the query id
llm._query_results[query_id]

Interpreting output

The chat() function will produce a dictionary with the following values:

initial_prompt: The question passed by the user
tool_result: A dictionary with the following information:
- query_id: The unique ID number of this query
- tool_call: The name and arguments of the tools the LLM called
- invoked_result: The actual DataFrame resulting from the tool calls
pd_code: A dictionary with the following information:
- data_desc: A text description of the data made available to the LLM via the tool call
- pd_code: The Python code the LLM executed to edit the raw data available to it
dataset: The actual DataFrame that is the result of the pd_code call
explanation: The LLM's explanation of the data manipulation process undergone to answer the user's question
commentary: The LLM's commentary on the final dataset answering the user's question
plots: A dictionary with the following values:
- visualization_call: A list of either the matplotlib code written or the plotting function call run to create the visualization to answer the user's question
- invoked_result: A list of the actual plot figures produced to answer the user's question
context_rich_prompt: The prompt passed to the LLM containing the prior context. Empty string if it's the first question in the chat.

Explanation of steps/chain

The LLM determines which raw data functions it wants to call with which arguments via the llm.gen_tool_call() function, the calls and generates the raw datasets.
Given the raw data available from the previous step, the llm.gen_pandas_df() produces Python code to create a final result dataset.
The LLM explains the data transformation steps via the llm.explain_pandas_df() function.
The LLM is given the final full result dataset and writes commentary answering the user's question via the llm.gen_final_commentary() funcrtion.
If validate=True in the llm.gen_final_commentary() call, the LLM performs a validation step on its commentary to look for and correct errors.
The LLM produces a visualization to help answer the user's question, via either the llm.gen_free_plot() function (if use_free_plot=True) or the llm.gen_plot_call() function. The former allows the LLM to create any Matplotlib plot, the latter restricts it to calling one of the predefined visualization tools. Useful if you want to customize style, etc.

If complete_responses is not None, i.e., a message history is passed, at the very beginning of the pipeline the user's prompt will be augmented with the full context history of previous messages, includding tool calls, data manipulation steps, commentary provided, and visualizations created.

Defining your own datasets/tools

The library contains the get_world_bank_gdp_data function as an example. To make additional data available to the LLM, you can define your own tools. For example, say we wanted to add a simple addition tool:

from langchain_core.tools import tool

@tool
def add(first_int: int, second_int: int) -> int:
    "Add two integers."
    return first_int + second_int
    
tools = [add, get_world_bank_gdp_data] # the LLM will now be able to choose either the addition tool, or the World Bank GDP tool.

As long as the input and outputs of the function are well defined, the LLM should be able to use it if helpful to answer a user's question.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.21

May 26, 2025

0.0.20

May 19, 2025

0.0.19

May 19, 2025

0.0.18

May 14, 2025

0.0.17

May 14, 2025

0.0.16

May 14, 2025

0.0.15

May 12, 2025

0.0.14

May 12, 2025

0.0.13

May 12, 2025

0.0.12

May 5, 2025

0.0.11

Apr 30, 2025

0.0.10

Apr 30, 2025

0.0.9

Apr 30, 2025

0.0.8

Apr 30, 2025

0.0.7

Apr 30, 2025

0.0.6

Apr 30, 2025

0.0.5

Apr 30, 2025

0.0.4

Apr 24, 2025

0.0.3

Apr 23, 2025

This version

0.0.2

Apr 22, 2025

0.0.1

Apr 22, 2025

0.0.0

Apr 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llads-0.0.2.tar.gz (11.9 kB view details)

Uploaded Apr 22, 2025 Source

File details

Details for the file llads-0.0.2.tar.gz.

File metadata

Download URL: llads-0.0.2.tar.gz
Upload date: Apr 22, 2025
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for llads-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`1d756993a8a9605344c36187054610b8a40d5f382486df8a0e407dc793140232`
MD5	`2507ec09e30bc343a8a65e83169cdd92`
BLAKE2b-256	`b93dd483a90599cfccdb463da785a512dcbe05116bdcae86319ec5898b62a986`

See more details on using hashes here.

llads 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llads

Installation

Usage

LLM

Example

Interpreting output

Explanation of steps/chain

Defining your own datasets/tools

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes