Skip to main content

Helpers & syntax sugar for PySpark.

Project description

Sparkly PyPi Version Sparkly Build Status Documentation Status

Helpers & syntax sugar for PySpark. There are several features to make your life easier:

  • Definition of spark packages, external jars, UDFs and spark options within your code;

  • Simplified reader/writer api for Cassandra, Elastic, MySQL, Kafka;

  • Testing framework for spark applications.

More details could be found in the official documentation.

Installation

Sparkly itself is easy to install:

pip install sparkly

The tricky part is pyspark. There is no official distribution on PyPI. As a workaround we can suggest:

  1. Use env variable PYTHONPATH to point to your Spark installation, something like:

    export PYTHONPATH="/usr/local/spark/python/lib/pyspark.zip:/usr/local/spark/python/lib/py4j-0.10.4-src.zip"
  2. Use our setup.py file for pyspark. Just add this to your requirements.txt:

    -e git+https://github.com/Tubular/spark@branch-2.1.0#egg=pyspark&subdirectory=python

Here in Tubular, we published pyspark to our internal PyPi repository.

Getting Started

Here is a small code snippet to show how to easily read Cassandra table and write its content to ElasticSearch index:

from sparkly import SparklySession


class MySession(SparklySession):
    packages = [
        'datastax:spark-cassandra-connector:2.0.0-M2-s_2.11',
        'org.elasticsearch:elasticsearch-spark-20_2.11:5.1.1',
    ]


if __name__ == '__main__':
    spark = MySession()
    df = spark.read_ext.cassandra('localhost', 'my_keyspace', 'my_table')
    df.write_ext.elastic('localhost', 'my_index', 'my_type')

See the online documentation for more details.

Testing

To run tests you have to have docker and docker-compose installed on your system. If you are working on MacOS we highly recommend you to use docker-machine. As soon as the tools mentioned above have been installed, all you need is to run:

make test

Supported Spark Versions

At the moment we support:

sparkly 2.x | Spark 2.0.x and Spark 2.1.x

sparkly 1.x | Spark 1.6.x

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkly-2.4.1.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

sparkly-2.4.1-py2.py3-none-any.whl (38.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sparkly-2.4.1.tar.gz.

File metadata

  • Download URL: sparkly-2.4.1.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for sparkly-2.4.1.tar.gz
Algorithm Hash digest
SHA256 db55368753ed97501528adde2722bcb0aaa228c6885ceac13aa890271f88e533
MD5 e6f7b072bd3ca5eb5e1216e1b3f608d7
BLAKE2b-256 10e9ca445d80e930edf1658b71cb0f375cab17dd8d72dd8ee19e8e6c9175d91b

See more details on using hashes here.

File details

Details for the file sparkly-2.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: sparkly-2.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for sparkly-2.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8663f563a9ecbe25d0ef305625579aa082140da01e399d029a4851a08a3e6a1c
MD5 ef6540692aef81d90ae2574994dff8b9
BLAKE2b-256 ed320b3aeea86ae9d11c3e47bedda144f5dfdbf2435e3fd7db670180d767280d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page