Normalize deeply nested JSON files into flat JSONL files.
Project description
jnorm
Jnorm is a Python library that normalizes deeply nested JSON files into a bunch of flat files in the JSON Lines format, generating primary keys based on the position of items. These normalized files can be easily loaded into a relational database and queried with SQL. While the queries are complex, they are at least possible.
Installation
- via pip:
pip install jnorm
- via Docker:
docker pull forrestbajbek2/jnorm:1.0.0
How it works
Consider the following JSON:
[
{
"name": "Bob",
"gender": "male",
"favorite_foods": [
"tacos",
"burritos"
],
"address": {
"address_1": "1234 Main Road",
"address_2": "Apt A",
"city": "Fort Lauderdale",
"state": "FL",
"zipcode": 12345
}
},
{
"name": "Alice",
"gender": "female",
"favorite_foods": [
"sushi",
"takoyami"
],
"address": {
"address_1": "5678 Secondary Road",
"address_2": "Apt B",
"city": "San Francisco",
"state": "CA",
"zipcode": 67890
}
}
]
Running jnorm example/people.json
yields the following jsonl
files:
example/output/people.jsonl
:{"people_id": 1, "name": "Bob", "gender": "male"} {"people_id": 2, "name": "Alice", "gender": "female"}
example/output/people_address.jsonl
:{"people_address_id": 1, "people_id": 1, "address_1": "1234 Main Road", "address_2": "Apt A", "city": "Fort Lauderdale", "state": "FL", "zipcode": 12345} {"people_address_id": 2, "people_id": 2, "address_1": "5678 Secondary Road", "address_2": "Apt B", "city": "San Francisco", "state": "CA", "zipcode": 67890}
example/output/people_favorite_foods.jsonl
:{"people_favorite_foods_id": 1, "people_id": 1, "value": "tacos"} {"people_favorite_foods_id": 2, "people_id": 1, "value": "burritos"} {"people_favorite_foods_id": 3, "people_id": 2, "value": "sushi"} {"people_favorite_foods_id": 4, "people_id": 2, "value": "takoyami"}
Once loaded into a database, it can be queried with SQL:
-- Show all favorite foods
SELECT
p.id AS people_id
, p.name
, p.gender
, pff.value as favorite_food
FROM people_favorite_foods pff
JOIN people p
ON p.people_id = pff.people_id
;
-- Show all addresses
SELECT
p.id as people_id
, p.name
, p.gender
, pa.address_1
, pa.address_2
, pa.city
, pa.state
, pa.zipcode
FROM people p
JOIN people_address pa
ON p.people_id = pa.people_id
;
Running from Docker
Example:
docker run -it --rm -v /path/to/folder:/data forrestbajbek2/jnorm:1.0.0 myfile.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jnorm-1.0.0.tar.gz
(5.4 kB
view hashes)
Built Distribution
jnorm-1.0.0-py3-none-any.whl
(5.6 kB
view hashes)