Normalize deeply nested JSON files into flat JSONL files.
Project description
jnorm
Jnorm is a Python library that normalizes deeply nested JSON files into a bunch of flat files in the JSON Lines format, generating primary keys based on the position of items. These normalized files can be easily loaded into a relational database and queried with SQL. While the queries are complex, they are at least possible.
Installation
- via pip:
pip install jnorm
- via Docker:
docker pull forrestbajbek2/jnorm:1.0.0
How it works
Consider the following JSON:
[
{
"name": "Bob",
"gender": "male",
"favorite_foods": [
"tacos",
"burritos"
],
"address": {
"address_1": "1234 Main Road",
"address_2": "Apt A",
"city": "Fort Lauderdale",
"state": "FL",
"zipcode": 12345
}
},
{
"name": "Alice",
"gender": "female",
"favorite_foods": [
"sushi",
"takoyami"
],
"address": {
"address_1": "5678 Secondary Road",
"address_2": "Apt B",
"city": "San Francisco",
"state": "CA",
"zipcode": 67890
}
}
]
Running jnorm example/people.json
yields the following jsonl
files:
example/output/people.jsonl
:{"people_id": 1, "name": "Bob", "gender": "male"} {"people_id": 2, "name": "Alice", "gender": "female"}
example/output/people_address.jsonl
:{"people_address_id": 1, "people_id": 1, "address_1": "1234 Main Road", "address_2": "Apt A", "city": "Fort Lauderdale", "state": "FL", "zipcode": 12345} {"people_address_id": 2, "people_id": 2, "address_1": "5678 Secondary Road", "address_2": "Apt B", "city": "San Francisco", "state": "CA", "zipcode": 67890}
example/output/people_favorite_foods.jsonl
:{"people_favorite_foods_id": 1, "people_id": 1, "value": "tacos"} {"people_favorite_foods_id": 2, "people_id": 1, "value": "burritos"} {"people_favorite_foods_id": 3, "people_id": 2, "value": "sushi"} {"people_favorite_foods_id": 4, "people_id": 2, "value": "takoyami"}
Once loaded into a database, it can be queried with SQL:
-- Show all favorite foods
SELECT
p.id AS people_id
, p.name
, p.gender
, pff.value as favorite_food
FROM people_favorite_foods pff
JOIN people p
ON p.people_id = pff.people_id
;
-- Show all addresses
SELECT
p.id as people_id
, p.name
, p.gender
, pa.address_1
, pa.address_2
, pa.city
, pa.state
, pa.zipcode
FROM people p
JOIN people_address pa
ON p.people_id = pa.people_id
;
Running from Docker
Example:
docker run -it --rm -v /path/to/folder:/data forrestbajbek2/jnorm:1.0.0 myfile.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jnorm-1.0.0.tar.gz
.
File metadata
- Download URL: jnorm-1.0.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5eef35a2e859293810f3850d345235b2f6a8981b001413ed4bf100d6d4ad0ccd
|
|
MD5 |
016aae939bb0aeb6fff04f4db489e733
|
|
BLAKE2b-256 |
0655f7818adf4ef8cd77cffc0ff1889f66903a03ec6df3076ea409f9c02885d0
|
File details
Details for the file jnorm-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: jnorm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
aba8bc82afaadd39daf03d1a85473ae4ee62ad82febacfb46203255d9aee6054
|
|
MD5 |
a1638d564fef5a5d314a8a2826bfad5b
|
|
BLAKE2b-256 |
50cf7fdbdb96093f1a2e0d612e2d103d36412b54bccb0af6f7d3d7d1e7ee61e3
|