Skip to main content

This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.

Project description

JsonSpark

This package is meant to give a python simplicity and feel to pyspark while handling json files.

It is very simple to use and doesn't need extra information if you are using python.

Steps:

  • Import the package
    import jsonSpark
  • Pass the pyspark json file object
    df = sql.read.json("filename", multiLine=True) # or get from S3 bucket
  • Create a JsonSpark object.
    df = jsonSpark(df)
  • See the schema if you wish.
    df.printSchema()

` df.show() ` * Use it as python dictionary
` df["key1"]["key2"]["key3"]["key4"].show() `

I will update the documentation and include a working example soon ....

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for jsonSpark, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size jsonSpark-0.0.1-py3-none-any.whl (3.3 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size jsonSpark-0.0.1.tar.gz (2.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page