sparkly·PyPI

Helpers & syntax sugar for PySpark.

These details have not been verified by PyPI

Project links

Homepage

Project description

Helpers & syntax sugar for PySpark. There are several features to make your life easier:

Definition of spark packages, external jars, UDFs and spark options within your code;
Simplified reader/writer api for Cassandra, Elastic, MySQL, Kafka;
Testing framework for spark applications.

More details could be found in the official documentation.

Installation

Sparkly itself is easy to install:

pip install sparkly

The tricky part is pyspark. There is no official distribution on PyPI. As a workaround we can suggest:

Use env variable PYTHONPATH to point to your Spark installation, something like:

export PYTHONPATH="/usr/local/spark/python/lib/pyspark.zip:/usr/local/spark/python/lib/py4j-0.10.4-src.zip"

Use our setup.py file for pyspark. Just add this to your requirements.txt:

-e git+https://github.com/Tubular/spark@branch-2.1.0#egg=pyspark&subdirectory=python

Here in Tubular, we published pyspark to our internal PyPi repository.

Getting Started

Here is a small code snippet to show how to easily read Cassandra table and write its content to ElasticSearch index:

from sparkly import SparklySession


class MySession(SparklySession):
    packages = [
        'datastax:spark-cassandra-connector:2.0.0-M2-s_2.11',
        'org.elasticsearch:elasticsearch-spark-20_2.11:6.5.4',
    ]


if __name__ == '__main__':
    spark = MySession()
    df = spark.read_ext.cassandra('localhost', 'my_keyspace', 'my_table')
    df.write_ext.elastic('localhost', 'my_index', 'my_type')

See the online documentation for more details.

Testing

To run tests you have to have docker and docker-compose installed on your system. If you are working on MacOS we highly recommend you to use docker-machine. As soon as the tools mentioned above have been installed, all you need is to run:

make test

Supported Spark Versions

At the moment we support:

sparkly >= 2.7 | Spark 2.4.x

sparkly 2.x | Spark 2.0.x and Spark 2.1.x and Spark 2.2.x

sparkly 1.x | Spark 1.6.x

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.8.2

Jun 12, 2020

2.8.1

Sep 11, 2019

2.8.0

Sep 3, 2019

2.7.1

Jun 26, 2019

2.7.0

Jun 26, 2019

2.6.0

Jun 26, 2019

2.5.1

Jun 26, 2019

2.5.0

Jun 26, 2019

2.4.1

Nov 9, 2018

2.4.0

Jul 4, 2018

2.3.0

Aug 25, 2017

2.2.1

Aug 3, 2017

2.2.0

Aug 3, 2017

2.1.1

Jul 27, 2017

2.1.0

Jul 10, 2017

2.0.4

Jun 2, 2017

2.0.2

Mar 23, 2017

2.0.1

Feb 13, 2017

1.1.1

Jan 23, 2017

1.1.0

Jan 19, 2017

1.0.0

Dec 2, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkly-2.8.2.tar.gz (33.7 kB view details)

Uploaded Jun 12, 2020 Source

File details

Details for the file sparkly-2.8.2.tar.gz.

File metadata

Download URL: sparkly-2.8.2.tar.gz
Upload date: Jun 12, 2020
Size: 33.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/3.7

File hashes

Hashes for sparkly-2.8.2.tar.gz
Algorithm	Hash digest
SHA256	`6b20381e01718dc2a783e67ec1d392ad6caa2437f028f12bbf97823c653eb948`
MD5	`c63197a5aad17ea4d45d1cbafcadda47`
BLAKE2b-256	`fd8b3a8e8390ce158be3f2a7b0e6fa7fcb6f6db837e6fe687e1181102196b7db`

See more details on using hashes here.

sparkly 2.8.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Getting Started

Testing

Supported Spark Versions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes