Skip to main content

This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.

Project description


This package is meant to give a python simplicity and feel to pyspark while handling json files.

It is very simple to use and doesn't need extra information if you are using python.


pip install jsonSpark

Sample Usage:

  • Import the package
    import jsonSpark

  • Pass the pyspark json file object
    df ="filename", multiLine=True) # or get from S3 bucket

  • Create a JsonSpark object.
    df = jsonSpark(df)

  • See the schema if you wish.

  • Display the Data

  • Use it as python dictionary

  • You can use the pyspark functions by converting the object back to pyspark object.
    pysparkObject = df._toDF()

I will update the documentation and include a working example soon ....

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release. See tutorial on generating distribution archives.

Built Distribution

jsonSpark-0.0.2-py2.py3-none-any.whl (3.4 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page