This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.
Project description
JsonSpark
This package is meant to give a python simplicity and feel to pyspark while handling json files.
It is very simple to use and doesn't need extra information if you are using python.
Installation
pip install jsonSpark
Sample Usage:
-
Import the package
import jsonSpark
-
Pass the pyspark json file object
df = sql.read.json("filename", multiLine=True) # or get from S3 bucket
-
Create a JsonSpark object.
df = jsonSpark(df)
-
See the schema if you wish.
df.printSchema()
-
Display the Data
df.show()
-
Use it as python dictionary
df["key1"]["key2"]["key3"]["key4"].show()
-
You can use the pyspark functions by converting the object back to pyspark object.
pysparkObject = df._toDF()
I will update the documentation and include a working example soon ....
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for jsonSpark-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37e596b9537a633cb89f66d19fdfa47f7808e71814ffbb26923bb3ac32308484 |
|
MD5 | 38baa34bccba8bde25086905536e8fd3 |
|
BLAKE2b-256 | eda0c20b3912217083dfdf6977a5e3df2e95094a2a9c73f517f75fb250a98c46 |