Skip to main content

Normalize deeply nested JSON files into flat JSONL files.

Project description

jnorm

Jnorm is a Python library that normalizes deeply nested JSON files into a bunch of flat files in the JSON Lines format, generating primary keys based on the position of items. These normalized files can be easily loaded into a relational database and queried with SQL. While the queries are complex, they are at least possible.

Installation

  • via pip: pip install jnorm
  • via Docker: docker pull forrestbajbek2/jnorm:1.0.0

How it works

Consider the following JSON:

[
    {
        "name": "Bob",
        "gender": "male",
        "favorite_foods": [
            "tacos",
            "burritos"
        ],
        "address": {
            "address_1": "1234 Main Road",
            "address_2": "Apt A",
            "city": "Fort Lauderdale",
            "state": "FL",
            "zipcode": 12345
        }
    },
    {
        "name": "Alice",
        "gender": "female",
        "favorite_foods": [
            "sushi",
            "takoyami"
        ],
        "address": {
            "address_1": "5678 Secondary Road",
            "address_2": "Apt B",
            "city": "San Francisco",
            "state": "CA",
            "zipcode": 67890
        }
    }
]

Running jnorm example/people.json yields the following jsonl files:

  • example/output/people.jsonl:
    {"people_id": 1, "name": "Bob", "gender": "male"}
    {"people_id": 2, "name": "Alice", "gender": "female"}
    
  • example/output/people_address.jsonl:
    {"people_address_id": 1, "people_id": 1, "address_1": "1234 Main Road", "address_2": "Apt A", "city": "Fort Lauderdale", "state": "FL", "zipcode": 12345}
    {"people_address_id": 2, "people_id": 2, "address_1": "5678 Secondary Road", "address_2": "Apt B", "city": "San Francisco", "state": "CA", "zipcode": 67890}
    
  • example/output/people_favorite_foods.jsonl:
    {"people_favorite_foods_id": 1, "people_id": 1, "value": "tacos"}
    {"people_favorite_foods_id": 2, "people_id": 1, "value": "burritos"}
    {"people_favorite_foods_id": 3, "people_id": 2, "value": "sushi"}
    {"people_favorite_foods_id": 4, "people_id": 2, "value": "takoyami"}
    

Once loaded into a database, it can be queried with SQL:

-- Show all favorite foods
SELECT
    p.id AS people_id
    , p.name
    , p.gender
    , pff.value as favorite_food
FROM people_favorite_foods pff
JOIN people p
    ON p.people_id = pff.people_id
;

-- Show all addresses
SELECT
    p.id as people_id
    , p.name
    , p.gender
    , pa.address_1
    , pa.address_2
    , pa.city
    , pa.state
    , pa.zipcode
FROM people p
JOIN people_address pa
    ON p.people_id = pa.people_id
;

Running from Docker

Example:

docker run -it --rm -v /path/to/folder:/data forrestbajbek2/jnorm:1.0.0 myfile.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jnorm-1.0.0.tar.gz (5.4 kB view hashes)

Uploaded Source

Built Distribution

jnorm-1.0.0-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page