Skip to main content

Normalize deeply nested JSON files into flat JSONL files.

Project description

jnorm

Jnorm is a Python library that normalizes deeply nested JSON files into a bunch of flat files in the JSON Lines format, generating primary keys based on the position of items. These normalized files can be easily loaded into a relational database and queried with SQL. While the queries are complex, they are at least possible.

Installation

  • via pip: pip install jnorm
  • via Docker: docker pull forrestbajbek2/jnorm:1.0.0

How it works

Consider the following JSON:

[
    {
        "name": "Bob",
        "gender": "male",
        "favorite_foods": [
            "tacos",
            "burritos"
        ],
        "address": {
            "address_1": "1234 Main Road",
            "address_2": "Apt A",
            "city": "Fort Lauderdale",
            "state": "FL",
            "zipcode": 12345
        }
    },
    {
        "name": "Alice",
        "gender": "female",
        "favorite_foods": [
            "sushi",
            "takoyami"
        ],
        "address": {
            "address_1": "5678 Secondary Road",
            "address_2": "Apt B",
            "city": "San Francisco",
            "state": "CA",
            "zipcode": 67890
        }
    }
]

Running jnorm example/people.json yields the following jsonl files:

  • example/output/people.jsonl:
    {"people_id": 1, "name": "Bob", "gender": "male"}
    {"people_id": 2, "name": "Alice", "gender": "female"}
    
  • example/output/people_address.jsonl:
    {"people_address_id": 1, "people_id": 1, "address_1": "1234 Main Road", "address_2": "Apt A", "city": "Fort Lauderdale", "state": "FL", "zipcode": 12345}
    {"people_address_id": 2, "people_id": 2, "address_1": "5678 Secondary Road", "address_2": "Apt B", "city": "San Francisco", "state": "CA", "zipcode": 67890}
    
  • example/output/people_favorite_foods.jsonl:
    {"people_favorite_foods_id": 1, "people_id": 1, "value": "tacos"}
    {"people_favorite_foods_id": 2, "people_id": 1, "value": "burritos"}
    {"people_favorite_foods_id": 3, "people_id": 2, "value": "sushi"}
    {"people_favorite_foods_id": 4, "people_id": 2, "value": "takoyami"}
    

Once loaded into a database, it can be queried with SQL:

-- Show all favorite foods
SELECT
    p.id AS people_id
    , p.name
    , p.gender
    , pff.value as favorite_food
FROM people_favorite_foods pff
JOIN people p
    ON p.people_id = pff.people_id
;

-- Show all addresses
SELECT
    p.id as people_id
    , p.name
    , p.gender
    , pa.address_1
    , pa.address_2
    , pa.city
    , pa.state
    , pa.zipcode
FROM people p
JOIN people_address pa
    ON p.people_id = pa.people_id
;

Running from Docker

Example:

docker run -it --rm -v /path/to/folder:/data forrestbajbek2/jnorm:1.0.0 myfile.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jnorm-1.0.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

jnorm-1.0.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file jnorm-1.0.0.tar.gz.

File metadata

  • Download URL: jnorm-1.0.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for jnorm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5eef35a2e859293810f3850d345235b2f6a8981b001413ed4bf100d6d4ad0ccd
MD5 016aae939bb0aeb6fff04f4db489e733
BLAKE2b-256 0655f7818adf4ef8cd77cffc0ff1889f66903a03ec6df3076ea409f9c02885d0

See more details on using hashes here.

File details

Details for the file jnorm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jnorm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for jnorm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aba8bc82afaadd39daf03d1a85473ae4ee62ad82febacfb46203255d9aee6054
MD5 a1638d564fef5a5d314a8a2826bfad5b
BLAKE2b-256 50cf7fdbdb96093f1a2e0d612e2d103d36412b54bccb0af6f7d3d7d1e7ee61e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page