PySpark Project Buiding Tool
This will implement a PySpark Project boiler plate code based on user input.
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
PySpark is the Python API for Spark.
git clone https://github.com/qburst/PySparkCLI.git cd PySparkCLI pip3 install -e . --user
Create a PySpark Project
pysparkcli create [PROJECT_NAME] --master [MASTER_URL] --cores [NUMBER] master - The URL of the cluster it connects to. You can also use -m instead of --master. cores - You can also use -c instead of --cores.
Run a PySpark Project
pysparkcli run [PROJECT_NAME]
PySpark Project Test cases
- Running by Project name
pysparkcli test [PROJECT_NAME]
- Running individual test case with filename: test_etl_job.py
pysparkcli test [PROJECT_NAME] -t [etl_job]
Check out here for our contribution guidelines.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
pysparkcli-0.0.7.tar.gz (9.1 kB view hashes)
pysparkcli-0.0.7-py3-none-any.whl (13.3 kB view hashes)
Hashes for pysparkcli-0.0.7-py3-none-any.whl