A useful thing that will take nested JSONS and output something a touch more SQL-sensible
Project description
Python Nest Shredder
Turn nested data into relational files!
Nest Shredder is a pandas-wrapper utility for converting nested JSON or Parquet data into relational "flat" Parquet files, typically for onward consumption into a relational database (where nested data may become less immediately useful).
Features
- Give the tool a nested JSON or Parquet and a target output path
- ...And you'll get a bunch of "flat" Parquet in the output path!
- Names the parquet files based on the path of the nested data
- Adds some id columns for relational integrity from source objects
- Shred functions accept a batch identifier for output metadata.
- Supports standard path or file-like inputs of Pandas for read_json / read_parquet methods.
- Defaults to Parquet output type but will output to JSON / CSV (no compression supported on these latter two at the moment).
Doesn't Feature(s)
- If you shred the same JSON object twice that has a nested array of objects it doesn't guarantee the order for each shred-time (but the ids will be valid for the run). Get yourself a key on that object! :)
- No compression on the output Parquet as standard. Will add later.
- Support for other Parquet libraries. May be later.
- Represent the full path in the parquet output to account for people naming child objects the same thing repeatedly. Will add later and burst into tears. Model your data properly.
Tech
Nest Shredder uses a couple of open source projects to work properly:
Installation
PyPi via PipEnv or Pip itself. Up to you!
Usage
The module exposes two functions at the moment;
- shred_json
- shred_parquet
Both accept:
- path_or_buff - the source file path (e.g. './examples/test.json') or a BytesIO-like file object
- target_folder_path - the path where you would like your flattened / unnested outputs. New folders will be created in here, using:
- object_name - a simple string that you can use to identify the overall object represented by your data (e.g. Customers or Addresses). One word only please.
e.g.
import nestshredder as ns
ns.shred_json('./examples/vsimple_example.json','./target','example')
- added a batch_ref identifier to further describe the object you're shredding.
e.g.
import nestshredder as ns
ns.shred_json('./examples/vsimple_example.json','./target','example','ABC123')
- shred_json also exposes most of the read_json Pandas stuff too in case you need it.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nestshredder-0.5.3.tar.gz
(6.2 kB
view hashes)