This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.
Project description
JsonSpark
This package is meant to give a python simplicity and feel to pyspark while handling json files.
It is very simple to use and doesn't need extra information if you are using python.
Steps:
- Import the package
import jsonSpark
- Pass the pyspark json file object
df = sql.read.json("filename", multiLine=True) # or get from S3 bucket
- Create a JsonSpark object.
df = jsonSpark(df)
- See the schema if you wish.
df.printSchema()
` df.show() ` * Use it as python dictionary
` df["key1"]["key2"]["key3"]["key4"].show() `
I will update the documentation and include a working example soon ....
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jsonSpark-0.0.1.tar.gz
(2.2 kB
view hashes)
Built Distributions
Close
Hashes for jsonSpark-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2e8428a7ef03ee01aab6c0e4beecb1d059ee05d6b718a4d5e0cda9fbf9fad76 |
|
MD5 | cfc2e8a39a4c7a1ed14aeadb595f4d76 |
|
BLAKE2b-256 | 249ea312db39409f7a09e54c278094fe9567d7f66f90eacb4dcca87cb833e4d9 |
Close
Hashes for jsonSpark-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a23f7e352fc529a443acb6b7169d07c076eaf501b7ab864a406b53025fc196b |
|
MD5 | 15b314080a793f9245a0ff8125622b7b |
|
BLAKE2b-256 | d52e0390301d092e29ca25145278fbd4f7ffc6ba5724ab74678d08bda334f5c4 |