LLM assistant for the development of Spark applications
Project description
Spark-LLM
Spark-LLM is a Python library that can assist in the development of Spark applications, including Spark Dataframe, Spark SQL, testings, and so on.
Installation
pip install spark-llm
Usage
Initialization
from langchain.chat_models import ChatOpenAI
from spark_llm import SparkLLMAssistant
llm = ChatOpenAI(model_name='gpt-4') # using gpt-4 can achieve better results
assistant=SparkLLMAssistant(llm=llm)
Data Ingestion
auto_df=assistant.create_df("2022 USA national auto sales by brand")
auto_df.show(n=5)
rank | brand | us_sales_2022 | sales_change_vs_2021 |
---|---|---|---|
1 | Toyota | 1849751 | -9 |
2 | Ford | 1767439 | -2 |
3 | Chevrolet | 1502389 | 6 |
4 | Honda | 881201 | -33 |
5 | Hyundai | 724265 | -2 |
DataFrame Transformation
auto_top_growth_df=assistant.transform_df(auto_df, "top brand with the highest growth")
auto_top_growth_df.show()
brand | us_sales_2022 | sales_change_vs_2021 |
---|---|---|
Cadillac | 134726 | 14 |
DataFrame Explaination
assistant.explain_df(auto_top_growth_df)
In summary, this dataframe is retrieving the brand with the highest sales change in 2022 compared to 2021. It presents the results sorted by sales change in descending order and only returns the top result.
Refer to example.ipynb for more detailed usage examples.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark_llm-0.1.2.tar.gz
(10.5 kB
view hashes)
Built Distribution
spark_llm-0.1.2-py3-none-any.whl
(11.6 kB
view hashes)
Close
Hashes for spark_llm-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82d51d07c1135d6464231dd02278b7d293d8be941d224179a886e8d9c3a07afd |
|
MD5 | 4e1691f36b186d08a37d2337ab950148 |
|
BLAKE2b-256 | 07b89f5594c6d4dbb886bde2e54a63fa3b001239ab033f602b0b8d362c302f6f |