Skip to main content

This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.

Project description

JsonSpark

This package is meant to give a python simplicity and feel to pyspark while handling json files.

It is very simple to use and doesn't need extra information if you are using python.

Steps:

  • Import the package
    import jsonSpark
  • Pass the pyspark json file object
    df = sql.read.json("filename", multiLine=True) # or get from S3 bucket
  • Create a JsonSpark object.
    df = jsonSpark(df)
  • See the schema if you wish.
    df.printSchema()

` df.show() ` * Use it as python dictionary
` df["key1"]["key2"]["key3"]["key4"].show() `

I will update the documentation and include a working example soon ....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonSpark-0.0.1.tar.gz (2.2 kB view hashes)

Uploaded Source

Built Distributions

jsonSpark-0.0.1-py3-none-any.whl (3.3 kB view hashes)

Uploaded Python 3

jsonSpark-0.0.1-py2.py3-none-any.whl (3.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page