Skip to main content

LLM assistant for the development of Spark applications

Project description

LLM Assistant for Apache Spark

Installation

pip install spark-llm

Usage

Initialization

from langchain.chat_models import ChatOpenAI
from spark_llm import SparkLLMAssistant

llm = ChatOpenAI(model_name='gpt-4') # using gpt-4 can achieve better results
assistant=SparkLLMAssistant(llm=llm)
assistant.activate() # active partial functions for Spark DataFrame

Data Ingestion

auto_df=assistant.create_df("2022 USA national auto sales by brand")
auto_df.show(n=5)
rank brand us_sales_2022 sales_change_vs_2021
1 Toyota 1849751 -9
2 Ford 1767439 -2
3 Chevrolet 1502389 6
4 Honda 881201 -33
5 Hyundai 724265 -2

DataFrame Transformation

auto_top_growth_df=auto_df.llm_transform("top brand with the highest growth")
auto_top_growth_df.show()
brand us_sales_2022 sales_change_vs_2021
Cadillac 134726 14

DataFrame Explanation

auto_top_growth_df.llm_explain()

In summary, this dataframe is retrieving the brand with the highest sales change in 2022 compared to 2021. It presents the results sorted by sales change in descending order and only returns the top result.

Refer to example.ipynb for more detailed usage examples.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_llm-0.1.3.tar.gz (10.6 kB view hashes)

Uploaded Source

Built Distribution

spark_llm-0.1.3-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page