Skip to main content

LLM assistant for the development of Spark applications

Project description

LLM Assistant for Apache Spark

Installation

pip install spark-llm

Usage

Initialization

from spark_llm import SparkLLMAssistant

assistant = SparkLLMAssistant()
assistant.activate() # active partial functions for Spark DataFrame

Data Ingestion

auto_df = assistant.create_df("2022 USA national auto sales by brand")
auto_df.show(n=5)
rank brand us_sales_2022 sales_change_vs_2021
1 Toyota 1849751 -9
2 Ford 1767439 -2
3 Chevrolet 1502389 6
4 Honda 881201 -33
5 Hyundai 724265 -2

Plot

auto_df.llm.plot()

2022 USA national auto sales by brand

To plot with an instruction:

auto_df.llm.plot("pie char for top 5 brands and the others' market shares")

2022 USA national auto sales_market_share by brand

DataFrame Transformation

auto_top_growth_df=auto_df.llm.transform("top brand with the highest growth")
auto_top_growth_df.show()
brand us_sales_2022 sales_change_vs_2021
Cadillac 134726 14

DataFrame Explanation

auto_top_growth_df.llm.explain()

In summary, this dataframe is retrieving the brand with the highest sales change in 2022 compared to 2021. It presents the results sorted by sales change in descending order and only returns the top result.

DataFrame Attribute Verification

auto_top_growth_df.llm.verify("expect sales change percentage to be between -100 to 100")

result: True

UDF Generation

@assistant.udf
def previous_years_sales(brand: str, current_year_sale: int, sales_change_percentage: float) -> int:
    """Calculate previous years sales from sales change percentage"""
    ...
    
spark.udf.register("previous_years_sales", previous_years_sales)
auto_df.createOrReplaceTempView("autoDF")

spark.sql("select brand as brand, previous_years_sales(brand, us_sales, sales_change_percentage) as 2021_sales from autoDF").show()
brand 2021_sales
Toyota 2032693
Ford 1803509
Chevrolet 1417348
Honda 1315225
Hyundai 739045

Cache

The SparkLLMAssistant supports a simple in-memory and persistent cache system. It keeps an in-memory staging cache that can be persisted through the commit() method. Cache lookup is always performed on the persistent cache only.

assistant.commit()

Refer to example.ipynb for more detailed usage examples.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_llm-0.1.9.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

spark_llm-0.1.9-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file spark_llm-0.1.9.tar.gz.

File metadata

  • Download URL: spark_llm-0.1.9.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for spark_llm-0.1.9.tar.gz
Algorithm Hash digest
SHA256 bec310cf8f33953bddf8938317a6b7a0300f7758f494cd59059bb6c670ebe28c
MD5 f21f9703032f38c3911d65b2d4829d59
BLAKE2b-256 9c7bd4ade4f894ac2b6a5037b235b8f1b3b34367b02de22a29db0d1ccdfd56dd

See more details on using hashes here.

File details

Details for the file spark_llm-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: spark_llm-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for spark_llm-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e70959977c1bbc39c835cab0bfa897e38cebab591bde543cb372d58300d7ae53
MD5 5bcf28651b84fdf312198388e56eb6f2
BLAKE2b-256 79a0c0019e3b9723234085f68a00cecacd0266f11a3c5fc39be1dc55cecdabf6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page