LLM assistant for the development of Spark applications

These details have not been verified by PyPI

Project links

Homepage

Project description

LLM Assistant for Apache Spark

Installation

pip install spark-llm

Usage

Initialization

from spark_llm import SparkLLMAssistant

assistant = SparkLLMAssistant()
assistant.activate() # active partial functions for Spark DataFrame

Data Ingestion

auto_df = assistant.create_df("2022 USA national auto sales by brand")
auto_df.show(n=5)

rank	brand	us_sales_2022	sales_change_vs_2021
1	Toyota	1849751	-9
2	Ford	1767439	-2
3	Chevrolet	1502389	6
4	Honda	881201	-33
5	Hyundai	724265	-2

Plot

auto_df.llm.plot()

2022 USA national auto sales by brand

To plot with an instruction:

auto_df.llm.plot("pie char for top 5 brands and the others' market shares")

2022 USA national auto sales_market_share by brand

DataFrame Transformation

auto_top_growth_df=auto_df.llm.transform("top brand with the highest growth")
auto_top_growth_df.show()

brand	us_sales_2022	sales_change_vs_2021
Cadillac	134726	14

DataFrame Explanation

auto_top_growth_df.llm.explain()

In summary, this dataframe is retrieving the brand with the highest sales change in 2022 compared to 2021. It presents the results sorted by sales change in descending order and only returns the top result.

DataFrame Attribute Verification

auto_top_growth_df.llm.verify("expect sales change percentage to be between -100 to 100")

result: True

UDF Generation

@assistant.udf
def previous_years_sales(brand: str, current_year_sale: int, sales_change_percentage: float) -> int:
    """Calculate previous years sales from sales change percentage"""
    ...
    
spark.udf.register("previous_years_sales", previous_years_sales)
auto_df.createOrReplaceTempView("autoDF")

spark.sql("select brand as brand, previous_years_sales(brand, us_sales, sales_change_percentage) as 2021_sales from autoDF").show()

brand	2021_sales
Toyota	2032693
Ford	1803509
Chevrolet	1417348
Honda	1315225
Hyundai	739045

Cache

The SparkLLMAssistant supports a simple in-memory and persistent cache system. It keeps an in-memory staging cache that can be persisted through the commit() method. Cache lookup is always performed on the persistent cache only.

assistant.commit()

Refer to example.ipynb for more detailed usage examples.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Licensed under the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.9

Jun 22, 2023

0.1.8

Jun 22, 2023

0.1.7

Jun 22, 2023

0.1.6

Jun 21, 2023

0.1.5

Jun 20, 2023

0.1.4

Jun 19, 2023

0.1.3

Jun 10, 2023

0.1.2

Jun 9, 2023

0.1.1

Jun 8, 2023

0.1.0

Jun 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_llm-0.1.9.tar.gz (18.7 kB view details)

Uploaded Jun 22, 2023 Source

Built Distribution

spark_llm-0.1.9-py3-none-any.whl (20.6 kB view details)

Uploaded Jun 22, 2023 Python 3

File details

Details for the file spark_llm-0.1.9.tar.gz.

File metadata

Download URL: spark_llm-0.1.9.tar.gz
Upload date: Jun 22, 2023
Size: 18.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for spark_llm-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`bec310cf8f33953bddf8938317a6b7a0300f7758f494cd59059bb6c670ebe28c`
MD5	`f21f9703032f38c3911d65b2d4829d59`
BLAKE2b-256	`9c7bd4ade4f894ac2b6a5037b235b8f1b3b34367b02de22a29db0d1ccdfd56dd`

See more details on using hashes here.

File details

Details for the file spark_llm-0.1.9-py3-none-any.whl.

File metadata

Download URL: spark_llm-0.1.9-py3-none-any.whl
Upload date: Jun 22, 2023
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for spark_llm-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e70959977c1bbc39c835cab0bfa897e38cebab591bde543cb372d58300d7ae53`
MD5	`5bcf28651b84fdf312198388e56eb6f2`
BLAKE2b-256	`79a0c0019e3b9723234085f68a00cecacd0266f11a3c5fc39be1dc55cecdabf6`