Normalize deeply nested JSON files into flat JSONL files.
Project description
jnorm
Jnorm is a Python library that normalizes deeply nested JSON files into a bunch of flat files in the JSON Lines format, generating primary keys based on the position of items. These normalized files can be easily loaded into a relational database and queried with SQL. While the queries are complex, they are at least possible.
Installation
- via pip:
pip install jnorm - via Docker:
docker pull forrestbajbek2/jnorm:1.0.0
How it works
Consider the following JSON:
[
{
"name": "Bob",
"gender": "male",
"favorite_foods": [
"tacos",
"burritos"
],
"address": {
"address_1": "1234 Main Road",
"address_2": "Apt A",
"city": "Fort Lauderdale",
"state": "FL",
"zipcode": 12345
}
},
{
"name": "Alice",
"gender": "female",
"favorite_foods": [
"sushi",
"takoyami"
],
"address": {
"address_1": "5678 Secondary Road",
"address_2": "Apt B",
"city": "San Francisco",
"state": "CA",
"zipcode": 67890
}
}
]
Running jnorm example/people.json yields the following jsonl files:
example/output/people.jsonl:{"people_id": 1, "name": "Bob", "gender": "male"} {"people_id": 2, "name": "Alice", "gender": "female"}
example/output/people_address.jsonl:{"people_address_id": 1, "people_id": 1, "address_1": "1234 Main Road", "address_2": "Apt A", "city": "Fort Lauderdale", "state": "FL", "zipcode": 12345} {"people_address_id": 2, "people_id": 2, "address_1": "5678 Secondary Road", "address_2": "Apt B", "city": "San Francisco", "state": "CA", "zipcode": 67890}
example/output/people_favorite_foods.jsonl:{"people_favorite_foods_id": 1, "people_id": 1, "value": "tacos"} {"people_favorite_foods_id": 2, "people_id": 1, "value": "burritos"} {"people_favorite_foods_id": 3, "people_id": 2, "value": "sushi"} {"people_favorite_foods_id": 4, "people_id": 2, "value": "takoyami"}
Once loaded into a database, it can be queried with SQL:
-- Show all favorite foods
SELECT
p.id AS people_id
, p.name
, p.gender
, pff.value as favorite_food
FROM people_favorite_foods pff
JOIN people p
ON p.people_id = pff.people_id
;
-- Show all addresses
SELECT
p.id as people_id
, p.name
, p.gender
, pa.address_1
, pa.address_2
, pa.city
, pa.state
, pa.zipcode
FROM people p
JOIN people_address pa
ON p.people_id = pa.people_id
;
Running from Docker
Example:
docker run -it --rm -v /path/to/folder:/data forrestbajbek2/jnorm:1.0.0 myfile.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jnorm-1.0.0.tar.gz.
File metadata
- Download URL: jnorm-1.0.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5eef35a2e859293810f3850d345235b2f6a8981b001413ed4bf100d6d4ad0ccd
|
|
| MD5 |
016aae939bb0aeb6fff04f4db489e733
|
|
| BLAKE2b-256 |
0655f7818adf4ef8cd77cffc0ff1889f66903a03ec6df3076ea409f9c02885d0
|
File details
Details for the file jnorm-1.0.0-py3-none-any.whl.
File metadata
- Download URL: jnorm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aba8bc82afaadd39daf03d1a85473ae4ee62ad82febacfb46203255d9aee6054
|
|
| MD5 |
a1638d564fef5a5d314a8a2826bfad5b
|
|
| BLAKE2b-256 |
50cf7fdbdb96093f1a2e0d612e2d103d36412b54bccb0af6f7d3d7d1e7ee61e3
|