SparkAI on CLI
Project description
PySparkAI CLI
SparkAI on CLI
Installation
Prerequisites
Java JDK 8 is required as a dependency of spark/pyspark itself. Make sure to have the JAVA_HOME environment variable setup as well.
If your environment is already configured to run pyspark applications, you are good to go.
Setup your environment
Setup OpenAI API key in environment variables:
export OPENAI_API_KEY='sk-...'
To use Google's search mechanism to find data on web, you must also setup Google API key in environment variables:
export GOOGLE_API_KEY='...'
Install pyspark-ai-cli
pip install git+https://github.com/lucas-lm/spark-ai-cli
Usage
Call CLI in your shell
python -m pyspark-ai "https://github.com/topics/google --limit=20"
Applying transformations over the source data:
pyspark-ai https://github.com/topics/google --transform "top 3 python repos with more stars"
By default the LLM used behind the scenes is gpt-3.5-turbo
, but you can change it with --gpt-model-name
flag:
pyspark-ai "https://github.com/topics/google" --transform "show me programming languages by stars from the most stared to the less stared" --gpt-model-name "gpt-4" --limit 20
Only OpenAI's LLMs are supported in the current version.
Warning
GPT-4 may be not be generally available, so you may face issues on it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spark_ai_cli-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 250b69a177855be037b226997dbc5971e505e5b27e63c09c3e6a0265c4ebb530 |
|
MD5 | 82b4554cb7486d3a706918a4e47344a8 |
|
BLAKE2b-256 | 6f1bd6301e9673044f70a7a24c95a27bdca5d1703f126b345494e53757048a60 |