Skip to main content

This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.

Project description

JsonSpark

This package is meant to give a python simplicity and feel to pyspark while handling json files.

It is very simple to use and doesn't need extra information if you are using python.

Installation

pip install jsonSpark

Sample Usage:

  • Import the package
    import jsonSpark

  • Pass the pyspark json file object
    df = sql.read.json("filename", multiLine=True) # or get from S3 bucket

  • Create a JsonSpark object.
    df = jsonSpark(df)

  • See the schema if you wish.
    df.printSchema()

  • Display the Data
    df.show()

  • Use it as python dictionary
    df["key1"]["key2"]["key3"]["key4"].show()

  • You can use the pyspark functions by converting the object back to pyspark object.
    pysparkObject = df._toDF()

I will update the documentation and include a working example soon ....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for jsonSpark, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size jsonSpark-0.0.2-py2.py3-none-any.whl (3.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page