LLM assistant for the development of Spark applications
Project description
Spark-LLM
Spark-LLM is a Python library that can assist in the development of Spark applications, including Spark Dataframe, Spark SQL, testings, and so on.
Installation
pip install spark-llm
Usage
To create an instance of SparkLLMAssistant:
from langchain.chat_models import ChatOpenAI
from spark_llm import SparkLLMAssistant
llm = ChatOpenAI(model_name='gpt-4') # using gpt-4 can achieve better results
assistant=SparkLLMAssistant(llm=llm)
To create a Dataframe with web search and LLM:
auto_df=assistant.create_df("2022 USA national auto sales by brand")
auto_df.show(n=5)
rank | brand | sales | Percentage_Change |
---|---|---|---|
1 | Toyota | 1849751 | -9 |
2 | Ford | 1767439 | -2 |
3 | Chevrolet | 1502389 | 6 |
4 | Honda | 881201 | -33 |
5 | Hyundai | 724265 | -2 |
To explain a Spark Dataframe in simple words
auto_top_growth_df = auto_df.orderBy(auto_df.percentage_change.desc()).limit(1)
assistant.explain_df(auto_top_growth_df)
In summary, this dataframe is retrieving the single record with the highest percentage change in sales from the
auto_sales_2022
view, which contains information about the rank, brand, sales, and percentage change in sales for various car brands in the year 2022.
Refer to example.ipynb for more detailed usage examples.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spark_llm-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4100c9a6331ab06d2f21c733efd8bb24923589e02f3e1f286c2225b92f974d1f |
|
MD5 | bd079ab254161f4763f08f453cbe1136 |
|
BLAKE2b-256 | eae36f40e0f0d487777d6d6c423098dd344f5808a4d242be16566dc280e15eea |