Skip to main content

SparkAI on CLI

Project description

PySparkAI CLI

SparkAI on CLI

Installation

Prerequisites

Java JDK 8 is required as a dependency of spark/pyspark itself. Make sure to have the JAVA_HOME environment variable setup as well.

If your environment is already configured to run pyspark applications, you are good to go.

Setup your environment

Setup OpenAI API key in environment variables:

export OPENAI_API_KEY='sk-...'

To use Google's search mechanism to find data on web, you must also setup Google API key in environment variables:

export GOOGLE_API_KEY='...'

Install pyspark-ai-cli

pip install git+https://github.com/lucas-lm/spark-ai-cli

Usage

Call CLI in your shell

python -m pyspark-ai "https://github.com/topics/google --limit=20"

Applying transformations over the source data:

pyspark-ai https://github.com/topics/google --transform "top 3 python repos with more stars"

By default the LLM used behind the scenes is gpt-3.5-turbo, but you can change it with --gpt-model-name flag:

pyspark-ai "https://github.com/topics/google" --transform "show me programming languages by stars from the most stared to the less stared" --gpt-model-name "gpt-4" --limit 20

Only OpenAI's LLMs are supported in the current version.

Warning

GPT-4 may be not be generally available, so you may face issues on it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_ai_cli-0.1.0.tar.gz (2.1 kB view hashes)

Uploaded Source

Built Distribution

spark_ai_cli-0.1.0-py3-none-any.whl (2.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page